CN104731718A - Cache system and method - Google Patents

Cache system and method Download PDF

Info

Publication number
CN104731718A
CN104731718A CN201410048036.7A CN201410048036A CN104731718A CN 104731718 A CN104731718 A CN 104731718A CN 201410048036 A CN201410048036 A CN 201410048036A CN 104731718 A CN104731718 A CN 104731718A
Authority
CN
China
Prior art keywords
instruction
branch
read pointer
target
processor core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410048036.7A
Other languages
Chinese (zh)
Inventor
林正浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Original Assignee
Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinhao Bravechips Micro Electronics Co Ltd filed Critical Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Priority to CN201410048036.7A priority Critical patent/CN104731718A/en
Priority to PCT/CN2014/094603 priority patent/WO2015096688A1/en
Publication of CN104731718A publication Critical patent/CN104731718A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks

Abstract

The invention provides a cache system and method which are applied to the field of processors. Before the processors execute instructions, a high-speed memory to which processor cores can have direct access are filled with the instructions, the processor cores do not need to provide instruction addresses, the high-speed memory is controlled to provide the instructions for the processor cores directly according to feedback information generated when the processor cores execute the instructions, almost the processor cores can obtain the needed instructions from the high-speed memory every time, and the extremely high cache hit ratio is achieved.

Description

A kind of caching system and method
Technical field
The present invention relates to computing machine, communication and integrated circuit fields.
Background technology
The effect of usual buffer memory is replicated in wherein by the partial content in even lower level storer, makes these contents can by more advanced memories or processor core quick access, to ensure the continuous service of streamline.
The addressing of existing buffer memory, all based on under type, is mated with the label section in address with the label that the index segment addressing in address reads in tag ram; The content in buffer memory is read with index segment in address and the common addressing of block intrinsic displacement section.If the label read from tag ram is identical with the label section in address, the content so read from buffer memory is effective, is called cache hit.Otherwise, if the label read from tag ram is not identical with the label section in address, be called cache miss, the content invalid read from buffer memory.For the buffer memory of multichannel set associative, carry out aforesaid operations, to detect which road group cache hit to each road group is parallel simultaneously.The reading content of hitting road group corresponding is effective content.If all roads group is all disappearance, then all reading contents are all invalid.After cache miss, cache control logic by the fills in rudimentary storage medium in buffer memory.
Cache miss can be divided three classes situation: force disappearance, conflict disappearance and capacity miss.In existing buffer structure, except looking ahead except successful fraction content, disappearance is forced to be inevitable.But existing prefetch operation can bring no small cost.In addition, although multichannel set associative buffer memory can reduce conflict disappearance, but be limited by power consumption and speed restriction (as because multichannel set associative buffer structure requires all roads group read by the content of same indexed addressing and label and compared simultaneously), road group number is difficult to exceed some.
The multi-level buffer memory that modern caching system is connected by multichannel group is usually formed.New buffer structure, as: sacrifice buffer memory, trace cache and looking ahead etc. is all based on above-mentioned basic buffer structure and improves said structure.But along with the processor day by day expanded/memory speed wide gap, current system structure, particularly multiple cache miss, having become is the most severe bottleneck of restriction modern processors performance boost.
The method and system device that the present invention proposes directly can solve above-mentioned or other one or more difficulties.
Summary of the invention
The present invention proposes a kind of caching method, it is characterized in that, the instruction being filled into level cache is examined, extract corresponding command information; According to described command information, the Branch Target Instruction of branch instructions all in level cache is prestored in L2 cache; First read pointer is provided to perform for processor core to read corresponding instruction level cache addressing; Perform the feedback information of instruction generation according to processor core, upgrade the value of the first read pointer; When processor core performs branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in L2 cache; If branch transition occurs, the first read pointer is updated to the branch target addressable address value of described branch instruction; If branch transition does not occur, the first read pointer is updated to the addressable address value of the rear instruction that this branch instruction order performs.
Optionally, in the process, according to described command information, the Branch Target Instruction of the branch instruction that will be performed by processor core is in advance filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
Optionally, in the process, the instruction being filled into level cache is examined, extract corresponding command information; First read pointer is according to described command information but not the function of instruction itself determines how to upgrade.
Optionally, in the process, when the first read pointer points to a branch instruction of having ready conditions, and one when being unconditional branch instructions thereafter, then according to the execution result of processor core to branch instruction of having ready conditions, if branch transition occurs, the branch target addressable address value of branch instruction of having ready conditions described in the first read pointer is updated to; If branch transition does not occur, the first read pointer is updated to the branch target addressable address value of described unconditional branch instructions; Processor core is made not need an independent clock period to perform described unconditional branch instructions.
Optionally, in the process, the value of the first read pointer is cushioned, and by the first read pointer value after described buffering, level cache addressing is performed for processor core to read corresponding instruction; First read pointer points to branch instruction in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
Optionally, in the process, second read pointer is provided; Described second read pointer points to the branch instruction after the first read pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
Optionally, in the process, when processor core performs branch instruction, perform as subsequent instructions according in branch prediction selecting sequence execution next instruction and Branch Target Instruction, and preserve another addressable address; If branch transition result is consistent with branch prediction, then continue to perform subsequent instructions; If branch transition result and branch prediction inconsistent, then empty streamline, and re-execute from the instruction that the addressable address of described preservation is corresponding.
Optionally, in the process, described command information comprises the branch target addressable address of instruction type and branch instruction.
Optionally, in the process, all corresponding instruction type of every bar instruction, described instruction type is one or more position.
Optionally, in the process, the corresponding branch target addressable address of every bar branch instruction.
Optionally, in the process, instruction type can be broken into further fundamental type information and branch instruction type; Wherein: fundamental type information is distinguished branch instruction and non-branch instruction, all corresponding fundamental type information of every bar instruction; Branch instruction type is distinguished further to branch instruction, the corresponding branch pattern information of every bar branch instruction.
Optionally, in the process, described branch instruction type comprises: branch instruction of having ready conditions, unconditional branch instructions.
Optionally, in the process, the instruction type of corresponding instruction is found according to the first read pointer; The branch target addressable address of Article 1 branch instruction after finding described instruction according to the second read pointer.
Optionally, in the process, the second corresponding read pointer value is obtained according to the first read pointer value by mapping.
Optionally, in the process, the second read pointer value equals the branch instruction number before the instruction of the first read pointer sensing.
Optionally, in the process, when processor core performs branch instruction, if branch transition occurs, the first read pointer is updated to the branch target addressable address value that the second read pointer points to; If branch transition does not occur, the first read pointer is updated to the addressable address value of the rear instruction that this branch instruction order performs.
The invention allows for a kind of caching system, it is characterized in that, comprising: processor core, for performing instruction; Level cache, for the instruction that storage of processor core will perform; L2 cache, for storing all instructions in level cache, and the Branch Target Instruction of all branch instructions in level cache; Initiatively table, its list item is corresponding with L2 cache instruction block, for the address information of instruction block in store secondary buffer memory; Block address mapping block, for storing the corresponding relation of level cache and L2 cache instruction address; Track table, its tracing point is corresponding with level cache instruction, for storing the command information of instruction in level cache; Described command information comprises the Branch Target Instruction positional information of instruction type and branch instruction; Scanner, for examining the instruction being filled into level cache, extracts corresponding command information, the Branch Target Instruction address of Branch Computed instruction, and the Branch Target Instruction address calculated initiatively is being shown and mated in block address mapping block; If mate unsuccessful, then at least one instruction comprising described Branch Target Instruction is filled into L2 cache from lower level external memory, and corresponding Branch Target Instruction positional information is stored in track table; If the match is successful, then direct Branch Target Instruction positional information is stored in track table; Tracking device (tracker), exports the first read pointer and performs for processor core to read command adapted thereto level cache addressing, and read described command information from track table; Described tracking device performs the feedback information of instruction generation and the value of described first read pointer of described command information renewal according to processor core; If branch transition occurs, the first read pointer is updated to the branch target addressable address value of described branch instruction; If branch transition does not occur, the first read pointer is updated to the addressable address value of the rear instruction that this branch instruction order performs.
Optionally, in the system, the first read pointer is according to described command information but not the function of instruction itself determines how to upgrade.
Optionally, in the system, the described command information reading tracing point that the first read pointer points to simultaneously and store in a tracing point thereafter from track table.
Optionally, in the system, when the first read pointer points to a branch instruction of having ready conditions, and one when being unconditional branch instructions thereafter, then according to the execution result of processor core to branch instruction of having ready conditions: if branch transition occurs, the branch target addressable address value of branch instruction of having ready conditions described in the first read pointer is updated to; If branch transition does not occur, the first read pointer is updated to the branch target addressable address value of described unconditional branch instructions; Processor core is made not need an independent clock period to perform described unconditional branch instructions.
Optionally, in the system, also comprise an impact damper, described impact damper is for storing the value of the first read pointer; The first read pointer value that described impact damper exports performs for processor core to read corresponding instruction level cache addressing; First read pointer of tracking device points to branch instruction in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
Optionally, in the system, a secondary tracking device (slave tracker) is also comprised; Described secondary tracking device exports the second read pointer, point to the branch instruction after the first read pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
Optionally, in the system, in described tracking device, also comprise a register, perform an addressable address in next instruction and Branch Target Instruction for storage order; When processor core performs branch instruction, perform as subsequent instructions according in branch prediction selecting sequence execution next instruction and Branch Target Instruction, and another addressable address is stored in described register; If branch transition result is consistent with branch prediction, then continue to perform subsequent instructions; If branch transition result and branch prediction inconsistent, then empty streamline, and instruction corresponding to the addressable address preserved from described register re-executes.
Optionally, in the system, in described track table every bar track last tracing point after increase again one terminate tracing point; The instruction type of described end tracing point is unconditional branch instructions, and its branch target addressable address is the addressable address that order performs next track first tracing point; When the first read pointer points to end tracing point, level cache exports dummy instruction.
Optionally, in the system, in described track table every bar track last tracing point after increase again one terminate tracing point; The instruction type of described end tracing point is unconditional branch instructions, and its branch target addressable address is the addressable address that order performs next track first tracing point; And when to terminate the tracing point before tracing point be not take-off point, this can be terminated the instruction type of tracing point and branch target addressable address as the instruction type of this tracing point and branch target addressable address.
Optionally, in the system, described track table comprises further: instruction type table, for storing fundamental type information corresponding to instruction; Described fundamental type information is distinguished branch instruction and non-branch instruction; A list item of the equal corresponding instruction type table of every bar instruction, and contents in table is one or more position; Destination address table, for storing branch instruction type corresponding to branch instruction and branch target addressable address; A list item of the corresponding destination address table of every bar branch instruction.
Optionally, in the system, described branch instruction type comprises: branch instruction of having ready conditions, unconditional branch instructions.
Optionally, in the system, from instruction type table, the fundamental type information of corresponding instruction is found according to the first read pointer; Described fundamental type information is sent to tracking device; After finding described instruction according to the second read pointer from destination address table Article 1 branch instruction branch instruction type and branch target addressable address; Described branch instruction type is sent to tracking device, and described branch target addressable address is sent to secondary tracking device.
Optionally, in the system, an offset address mapping block is also comprised, for the first read pointer value is mapped as corresponding destination address tabular number; Described destination address tabular number is sent to secondary tracking device.
Optionally, in the system, described offset address mapping block comprises: code translator, for according to the first read pointer value, produce a mask value, the instruction that described first read pointer is pointed to and mask value corresponding to instruction are thereafter ' 0 ', and other mask values are ' 1 '; Mask device, carries out and operation for the fundamental type information in the mask value and instruction type list that produces code translator, obtains control word; Selector switch array; In described selector switch array, every column selector is selected according to the value of control word corresponding positions; If this position is ' 0 ', then select the input in one's own profession prostatitis; If this position is ' 1 ', then select the input in descending prostatitis; Selector switch array is inputted unique one ' 1 ' to be equaled by the line number of above moving in control word ' 1 ' number, thus the first read pointer value is mapped as corresponding destination address tabular number.
Optionally, in the system, when processor core performs branch instruction, if branch transition occurs, the first read pointer of tracking device is updated to described branch target addressable address value; If branch transition does not occur, the first read pointer of tracking device is updated to the addressable address value of the rear instruction that this branch instruction order performs.
Optionally, in the system, occur if processor core performs branch instruction and branch transition, the second read pointer of secondary tracking device is updated to the row number that offset address mapping block is sent here; If processor core performs branch instruction and branch transition does not occur, the second read pointer value of secondary tracking device is increased one, points to the next list item in destination address table.
Optionally, in the process, processor core comprises two front end streamlines and a backend pipeline; Described method also provides: first instruction reads buffering, for storing present instruction block; A third reading pointer is read to cushion addressing to read the order front end streamline execution of corresponding instruction for processor core to described first instruction; Described first read pointer supplies the target front end streamline of processor core to perform level cache addressing to read corresponding instruction.
Optionally, in the process, if the branch transition of branch instruction does not occur: the branch target addressable address the first read pointer value being updated to next branch instruction of third reading pointed, make the first read pointer point to this Branch Target Instruction block in level cache, and read the target front end streamline execution of corresponding instruction for processor core; Third reading pointer continues to upgrade, and reads to read the order front end streamline execution of corresponding instruction for processor core buffering from the first instruction; If the branch transition of branch instruction successfully occurs: the instruction block pointed to by the first read pointer is filled into the first instruction from level cache and reads buffering, and third reading pointer value is updated to the value after the first read pointer increasing one; Third reading pointer upgrades from this value, and reads to read the order front end streamline execution of corresponding instruction for processor core buffering from the first instruction.
Optionally, in the process, second instruction is also provided to read buffering, for storing target instruction target word block; First read pointer is read to cushion addressing to read the target front end streamline execution of corresponding instruction for processor core to described second instruction; Second read pointer; Described second read pointer points to the branch instruction after the first read pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache; 4th read pointer; Described 4th read pointer points to the branch instruction after third reading pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
Optionally, in the process, when the 4th read pointer points to the Article 1 branch instruction after third reading pointer in advance, first read pointer value is updated to the branch target addressable address of the branch instruction of third reading pointed, make the first read pointer point to this Branch Target Instruction block in level cache, and read the target front end streamline execution of corresponding instruction for processor core.
Optionally, in the system, processor core comprises two front end streamlines and a backend pipeline; Described system also comprises: first instruction reads buffering, for storing present instruction block; A second tracking device, exports third reading pointer and reads to cushion addressing to read the order front end streamline execution of corresponding instruction for processor core to described first instruction; Described first read pointer supplies the target front end streamline of processor core to perform level cache addressing to read corresponding instruction.
Optionally, in the system, if the branch transition of branch instruction does not occur: the branch target addressable address the first read pointer value being updated to next branch instruction of third reading pointed, make the first read pointer point to this Branch Target Instruction block in level cache, and read the target front end streamline execution of corresponding instruction for processor core; Third reading pointer continues to upgrade, and reads to read the order front end streamline execution of corresponding instruction for processor core buffering from the first instruction; If the branch transition of branch instruction successfully occurs: the instruction block pointed to by the first read pointer is filled into the first instruction from level cache and reads buffering, and third reading pointer value is updated to the value after the first read pointer increasing one; Third reading pointer upgrades from this value, and reads to read the order front end streamline execution of corresponding instruction for processor core buffering from the first instruction.
Optionally, in the system, also comprise: second instruction reads buffering, for storing target instruction target word block; First read pointer is read to cushion addressing to read the target front end streamline execution of corresponding instruction for processor core to described second instruction; A secondary tracking device, exports the second read pointer; Described second read pointer points to the branch instruction after the first read pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache; A second secondary tracking device, exports the 4th read pointer; Described 4th read pointer points to the branch instruction after third reading pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
Optionally, in the system, when the 4th read pointer points to the Article 1 branch instruction after third reading pointer in advance, first read pointer value is updated to the branch target addressable address of the branch instruction of third reading pointed, make the first read pointer point to this Branch Target Instruction block in level cache, and read the target front end streamline execution of corresponding instruction for processor core.
For this area professional person, can also to understand under the inspiration of explanation of the present invention, claim and accompanying drawing, understand the present invention and comprise other aspects.
beneficial effect
The buffer structure that system and method for the present invention can use for digital display circuit provides basic solution.Different from the mechanism that traditional caching system is only just filled after cache miss, system and method for the present invention was just filled instruction buffer before processor performs an instruction, can avoid or hide cache miss fully.
The present invention performs the feedback of stream and processor core execution instruction according to program, does not need processor core to provide instruction address, directly provides instruction by higher level buffer memory to processor core, decrease pipeline depth, improve pipeline efficiency.Special in branch misprediction, the pipeline cycle that can cut the waste.
For the professional person of this area, other advantages and applications of the present invention are obvious.
Accompanying drawing explanation
Figure 1A is an embodiment of buffer structure of the present invention;
Figure 1B is another embodiment of buffer structure of the present invention;
Fig. 1 C is that the present invention supports to guess the embodiment performed; ;
Fig. 2 is another embodiment of caching system of the present invention;
Fig. 3 is another embodiment of caching system of the present invention;
Fig. 4 is another embodiment of caching system of the present invention;
Fig. 5 is an embodiment of instruction type table of the present invention and destination address table;
Fig. 6 is an embodiment of offset address mapping block of the present invention;
Fig. 7 is another embodiment of caching system of the present invention;
Fig. 8 is another embodiment of tracking device of the present invention;
Fig. 9 is an embodiment of the caching system avoiding branch to lose of the present invention;
Figure 10 is an embodiment of track table content of the present invention;
Figure 11 is another embodiment of the caching system avoiding branch to lose of the present invention.
Embodiment
The High-performance cache system proposed the present invention below in conjunction with the drawings and specific embodiments and method are described in further detail.According to the following describes and claims, advantages and features of the invention will be clearer.It should be noted that, accompanying drawing all adopts the form that simplifies very much and all uses non-ratio accurately, only in order to object that is convenient, the aid illustration embodiment of the present invention lucidly.
It should be noted that, in order to content of the present invention is clearly described, the present invention is especially exemplified by multiple embodiment to explain different implementation of the present invention further, and wherein, the plurality of embodiment is enumerative and non-exhaustive.In addition, succinct in order to what illustrate, the content mentioned in front embodiment is often omitted in rear embodiment, and therefore, in rear embodiment, NM content can be corresponding to front embodiment.
Although this invention can be expanded in amendment in a variety of forms and replacing, also list some concrete enforcement legends in instructions and be described in detail.Should be understood that, the starting point of inventor is not that this invention is limited to set forth specific embodiment, antithesis, the starting point of inventor be to protect carry out in all spirit or scope based on being defined by this rights statement improvement, equivalency transform and amendment.Same components and parts number may be used to institute's drawings attached to represent same or similar part.
Instruction address of the present invention (InstructionAddress) refers to the memory address of instruction in primary memory, namely can find this instruction according to this address in primary memory.For the purpose of in this case simple and clear, all suppose that virtual address equals physical address, for needing the situation of carrying out address maps, the method for the invention is also applicable.In the present invention, present instruction can refer to current by instruction that processor core performs or obtains; Present instruction block can refer to the instruction block containing the current instruction be just executed by processor.
In the present invention, branch instruction (Branch Instruction) or take-off point (Branch Point) refer to any processor core change that can cause suitably and perform the instruction type of stream (Execution Flow) (as: non-execution instruction or microoperation in order).Branch instruction address refers to the instruction address of branch instruction itself, and this address is made up of instruction block address and command offsets address.The target instruction target word that the branch transition that the branch instruction that refers to Branch Target Instruction causes turns to, Branch Target Instruction address refers to the instruction address of Branch Target Instruction.
Please refer to Figure 1A, it is an embodiment of buffer structure of the present invention.In figure ia, processor system is made up of processor core 102, level cache 104, scanner 106, L2 cache 108, track table 110, replacement module 124, tracking device (tracker) 120, initiatively table 130, block address mapping block 134 and selector switch 132,140,142,146,148,150.A controller in addition of not display in Figure 1A, this controller receives and controls the operation of each functional module from processor core 102, block address mapping block 134, scanner 106, the initiatively output of table 130, track table 110 and replacement module 124.
In the present invention, the positional information of presentation directives in level cache or L2 cache can be carried out with the first address (BNX) and the second address (BNY).At this, the first address and the second address can be the addressable address of level cache, also can be the addressable address of L2 cache.When an instruction has been stored in level cache 104, the one-level block number (namely pointing to a corresponding first-level instruction block in level cache) of this instruction place instruction block can be represented with BN1X, and represent the one-level block bias internal amount (i.e. the relative position of this instruction in first-level instruction block) of this instruction with BN1Y.When an instruction has been stored in L2 cache 108, the second-order block number (namely pointing to a corresponding secondary instruction block in L2 cache) of this instruction place instruction block can be represented with BN2X, and represent the second-order block bias internal amount (i.e. the relative position of this instruction in secondary instruction block) of this instruction with BN2Y.For convenience of explanation, BN1X and BN1Y can be represented with BN1, represent BN2X and BN2Y with BN2.Because instructions in level cache all in the present invention all have storage in L2 cache, therefore for the instruction stored in level cache, can represent with BN1 or BN2.
In the present invention, the positional information of tracing point in track table 110 can be represented with described first address and the second address.Then can also comprise branch target addressable address in the instruction type of take-off point is represent (i.e. branch target be BN1 direct descendant's instruction) with BN1 or represent the information of (i.e. branch target be BN2 direct descendant's instruction) with BN2.When store in a take-off point be BN1 time, illustrate that the instruction block at the Branch Target Instruction place of this take-off point has been stored in the storage block pointed to by this BN1X in level cache 104, and can therefrom find described Branch Target Instruction according to this BN1Y.When store in a take-off point be BN2 time, illustrate that the instruction block at the Branch Target Instruction place of this take-off point has been stored in the storage block pointed to by this BN2X in L2 cache 108, and can therefrom find described Branch Target Instruction according to this BN2Y, but directly cannot determine whether the instruction that this Branch Target Instruction is corresponding has been stored in level cache 104.
In the present invention, a secondary instruction block can comprise several first-level instruction blocks, like this, directly can obtain one-level block bias internal amount (BN1Y) of its correspondence according to the second-order block bias internal amount (BN2Y) of instruction in secondary instruction block.Such as, if secondary instruction block comprises two first-level instruction blocks, then the BN2Y of the instruction in this secondary instruction block is than the BN1Y of this instruction in its place first-level instruction block many (bit).Most significant digit (MSB in described BN2Y, most significant bit) namely represent which of two first-level instruction blocks corresponding to this secondary instruction block this instruction be arranged in, and other are just equivalent to the BN1Y that this instruction is arranged in its place first-level instruction block.A secondary instruction block is comprised to the situation of more first-level instruction blocks, can be by that analogy.
List item initiatively in table 130 and the storage block one_to_one corresponding in L2 cache 108.It is right with mating of a second-order block BN2X that each list item initiatively in table 130 stores a secondary instruction block address, specifies secondary instruction block corresponding to this instruction block address and be stored in which storage block in L2 cache 108.In the present invention, can mate in active table 130 according to a secondary instruction block address, and obtain a BN2X when the match is successful; Also can according to a BN2X to the addressing of active table 130, to read corresponding secondary instruction block address.
Scanner 106 is examined the instruction block sent here through bus 107 from L2 cache 108, and extract tracing point information and be filled in the corresponding list item of track table 110, thus set up the track of at least one first-level instruction block corresponding to this secondary instruction block, scanner 106 exports this instruction block from bus 105 and is filled into level cache 104 simultaneously.When setting up track, first producing a BN1X by replacement module 124 and pointing to an available tracks.In the present invention, replacement module 124 can determine available tracks according to replacement algorithm (as lru algorithm).
When filling instruction block through scanner 106 to level cache 104 from L2 cache 108, scanner 106 calculates the branch target address of the branch instruction contained in this instruction block.The branch target address calculated is sent to initiatively table 130 and determines whether this branch target has been stored in L2 cache 108 with the instruction block matching addresses wherein stored.If mate unsuccessful, then the instruction block at Branch Target Instruction place is not yet filled in L2 cache 108, so while being filled into L2 cache 104 by this instruction block from lower level storer, in active table 130, set up mating of corresponding secondary instruction block address and second-order block number right.
Particularly, scanner 106 is examined each instruction from bus 107, and extract some information, as branch's increment (Branch Offset) of: instruction type, command source address and branch instruction, and calculate branch target address based on these information.For direct descendant's instruction, can be added by the block address to this instruction place instruction block, this instruction side-play amount in instruction block and branch increment three and obtain branch target address.Described instruction block address can read and be sent directly to totalizer in scanner 106 from active table 130.Also can increase the register for storing present instruction block address in scanner 106, so initiatively table 130 does not just need to send instruction block address in real time.In the present embodiment, the branch target address of direct descendant's instruction is produced by scanner 106, and the branch target address of indirect branch instruction is produced by processor core 102.
Row in block address mapping block 134 and the list item in initiatively table 130 and the storage block one_to_one corresponding in L2 cache 108, and pointed to by same BN2X, for the corresponding relation of store secondary block number and one-level block number.In block address mapping block 134, every a line of corresponding each L2 cache block has a plurality of list item, stores the one-level block number (BN1X) of the level cache block that appropriate section is corresponding in L2 cache block therewith in each list item.Like this, for a BN2, a line in block address mapping block 134 can be found according to BN2X wherein, then in this row, find corresponding list item by wherein BN2Y a high position.If the effective BN1X stored in this list item, then illustrate that the instruction that described BN2 is corresponding has been stored in level cache 104, BN2 can be converted to corresponding BN1; Otherwise, illustrate that the instruction that described BN2 is corresponding is not yet stored in level cache 104.Particularly, comprise two first-level instruction blocks for a secondary instruction block, corresponding two BN1X of the every a line in block address mapping block 134, and according to that BN1X list item that the most significant digit of BN2Y in BN2 can find this BN2 corresponding.A secondary instruction block is comprised to the situation of more first-level instruction blocks, can be by that analogy.
In the present embodiment, track table 110 is containing a plurality of tracing point (track point).Tracing point is a list item in track table, corresponding at least one instruction, and the address increment that in track, each tracing point is from left to right corresponding.In addition, finally additionally a list item (end tracing point) can be increased at often go (every bar track) of track table 110, for storing the position of next track that sensing order performs.The tracing point content of track table 110 can comprise: form (TYPE), second-order block number (BN2X) and second-order block bias internal (BN2Y).The tracing point content of track table 110 also can comprise form (TYPE), one-level block number (BN1X) and one-level block bias internal (BN1Y).At this, in form, comprise instruction type, as: branch instruction, data access instruction and other instructions etc.Branch instruction can be segmented further, as: the instruction of unconditional direct descendant, direct descendant's instruction of having ready conditions, unconditional indirect branch instruction and indirect branch instruction etc. of having ready conditions, the tracing point of its correspondence is take-off point.Data access instruction also can be segmented further, and as data read command and instruction data storage etc., the tracing point of its correspondence is data point.
The tracing point address of tracing point itself and the instruction address of instruction relevant (correspond) representated by this tracing point in the present invention; And the tracing point address containing branch target in branch instruction tracing point, and this tracing point address is relevant to Branch Target Instruction address.The a plurality of continuous print tracing points corresponding with the first-level instruction block that continual command a series of in level cache 104 is formed are called a track.This first-level instruction block and corresponding track are indicated by same one-level block BN1X.Track table 110 is containing at least one track.Article one, the total track in track is counted and can be equaled list item sum in track table 110 in a line (also can increase and terminate tracing point).Like this, track table 110 just becomes one represents branch instruction table with track table entry address respective branches source address, contents in table respective branches destination address.In addition, in every a line of track table 110, additionally a second-order block list item can also be increased again, for recording BN2X corresponding to this row track BN1X except the tracing point of each corresponding instruction and end tracing point.Like this, just can when certain first-level instruction block be replaced, be converted to corresponding BN2X by with the branch target BN1X in the take-off point in other track table row of behavior branch target, and BN1Y is converted to corresponding BN2Y, this row can be write by other dos command line DOSs and unlikelyly cause mistake.
The possible path that the program that have recorded in track table 110 is run or program perform may flowing to of stream, and therefore tracking device 120 can according to the feedback of the program flow in track table 110 and processor core 102 along program flow tracking.Because there be the instruction corresponding to track table list item in level cache 104, level cache 104 for reading address with the output bus 115 of tracking device 120, is followed program flow that tracking device 120 follows and sends instruction by bus 103 and perform for processor core 102.In track table 110, some branch target L2 cache address BN2 records, and its objective is and only the instruction that will use will may deposit level cache 104, makes level cache 104 can have the capacity less compared with L2 cache 108 and speed faster.When branch records with BN2 in the list item that tracking device 130 reads, this BN2 is sent to the module converts such as block address mapping block 134 and obtain BN1 address, or the instruction in L2 cache is filled in level cache 104 by newly assigned BN1 address; Also the BN1 address of this acquisition or distribution to be filled out back in track table 110 in this list item, the instruction execution result (as: execution result of branch instruction) that tracking device 120 feeds back from this BN1 and according to processor core 102, controls level cache 104 to processor core 102 output order for performing.
The writing address that track table 110 write port is corresponding has two sources, is row address selector switch 140 (BN1X) and column address selector switch 142 (BN1Y) respectively.When setting up track, the row address BN1X that selector switch 140 selects replacement module 124 to export, and the column address BN1Y that selector switch 142 selects scanner 106 to export.When store in the tracing point content that tracking device 120 reads be BN2 time, this BN2 is sent to block address mapping block 134 and is converted to BN1, and this BN1 needs to be write back in this tracing point and (namely reads, revises and write back, read modifywrite); In addition when the instruction type in the tracing point content that tracking device 120 reads is indirect branch instruction, the indirect branch target address produced by processor core 102 is sent to initiatively table 130 coupling through bus 149 again and is obtained BN1 after bus 155 is selected by selector switch 148, or coupling/distribute obtains BN2X and is converted to BN1 by described method before through block address mapping block 134 etc.This BN1 also needs to be write back in this tracing point.In both cases, row address selector switch 140 and column address selector switch 142 all select at that time reading address pointer 115 on BN1X and BN1Y as the writing address of track table 110.
Track table 110 write port itself has two sources: bus 125 and 123, as write content after selector switch 146 is selected.Value wherein in bus 125 is the BN1 that block address mapping block 134 exports, and the value in bus 123 is the branch target address of L2 cache address pattern (BN2).
According to technical solution of the present invention, while instruction is filled into level cache 104, scanner 106 is examined, is extracted corresponding information.Particularly, if this instruction is branch instruction, then scanner 106 Branch Computed destination address.Block address in described branch target address is delivered to initiatively table 130 through bus 149 again and is mated after bus 129 is selected by selector switch 148.If the match is successful, obtain being matched to after BN2X corresponding to term of works is selected by selector switch 132 and be sent to block address mapping block 134 through bus 133, and in the row pointed to by described BN2X, search corresponding BN1X according to the BN2Y in described branch target address.If effective BN1X exists, then export this BN1X by bus 125, and convert described BN2Y to corresponding BN1Y.The BN1X127 that selector switch 140 selects this branch instruction of being produced by replacement module 124 corresponding is as the first address in track table 110 write address, the block bias internal amount 119 of this branch instruction in its place instruction block that selector switch 142 selects scanner 106 to export is as the second address in track table 110 write address, and the BN1 in bus 125 together writes in track table 110 as tracing point content through bus 147 with the described instruction type extracted after selector switch 146 is selected, complete the foundation of this tracing point.That now comprise in this tracing point is BN1.
If effective BN1X that this BN2 is corresponding in block address mapping block 134 does not exist, then selector switch 140 select this branch instruction of being produced by replacement module 124 corresponding BN1X127 as the first address in track table 110 write address, the block bias internal amount 119 of this branch instruction in its place instruction block that selector switch 142 selects scanner 106 to export is as the second address in track table 110 write address, BN2Y in the branch target address calculate this BN2X in bus 133 and scanner 106 is spliced into BN2 and puts bus 123 and together write in track table 110 as tracing point content through bus 147 with the described instruction type extracted after selector switch 146 is selected, complete the foundation of this tracing point.That now comprise in this tracing point is BN2.
If the block address in described branch target address is mated unsuccessful in active table 130, represent that the instruction that this branch target address is corresponding is not yet stored in L2 cache 108, then distribute the block BN2X of a secondary storage block according to replacing algorithm (as lru algorithm), and the storer this branch target address being sent to lower level time obtains command adapted thereto block is stored in the storage block that L2 cache 108 points to by described BN2X through bus 109.The BN1X127 that selector switch 140 selects this branch instruction of being produced by replacement module 124 corresponding is as the first address in track table 110 write address, the block bias internal amount 119 of this branch instruction in its place instruction block that selector switch 142 selects scanner 106 to export is as the second address in track table 110 write address, directly the block bias internal address (and BN2Y) in this BN2X and described branch target address is merged into BN2 put bus 123 and together write in track table 110 as tracing point content through bus 147 with the described instruction type extracted after selector switch 146 is selected, complete the foundation of this tracing point.That now comprise in this tracing point is BN2.
Repetition said process like this, can realize filling instruction from L2 cache 108 to level cache 104 and setting up corresponding track.In the process, the level cache block address BN1X127 that replacement module 124 produces also provides write address to level cache device 104, writes for the one-level storage block corresponding to track table.If this storage block is gradation write, then the BN1Y address that can produce when scanning an instruction block by scanner 106 is supplied to level cache device 104 by bus 119 and writes into correct address with steering order.
Tracking device 120 is made up of register 112, incrementer 114 and selector switch 118, its read pointer 115 (i.e. the output of register 112) points to the tracing point of instruction (i.e. present instruction) correspondence that processor core in track table 110 102 is about to perform, and reads tracing point content and be sent to selector switch 118 through bus 117.Meanwhile, the addressing of read pointer 115 pairs of level caches 104, reads this present instruction and is sent to processor core 102 for performing through bus 103.The form of read pointer 115 is BN1X, BN1Y.Wherein BN1X chooses the corresponding storage block in a line in track table 110 and first-level instruction buffer memory 104, and BN1Y chooses the instruction that in this row, in a list item and storage block, order is answered.
Stepping (stepping) signal 111 that register 112 is sent here by processor core 102 controls.Stairstep signal 111 is feedback signals that processor core provides to tracking device, this signal is ' 1 ' when processor core normally works always, register 112 each clock period in tracking device 120 is upgraded, makes read pointer 115 point in track table in a new list item and level cache 104 a new instruction and perform for processor core.When operation irregularity in processor core, when needing arrhea waterline or can not perform new instruction, then stairstep signal is ' 0 ', makes register 112 stop upgrading, tracking device 120 and pointer 115 keep original state constant, and level cache 104 suspends provides new instruction to processor core 102.
Read pointer 115 points to a list item (tracing point) in track table 110, it is read by bus 117.If this instruction of instruction type via controller coding display in the tracing point content that bus 117 reads is not branch instruction, controller then the controlled selector 118 BN1X value selecting to derive from register 112 and derive from incrementer 114 increasing one after BN1Y send the input of register 112 back to as new BN1 output.After register 112 upgrades under effective stairstep signal 111 controls, read pointer 115 point to right on the same track of former tracing point the next tracing point of order and from level cache 104 next instruction of reading order perform for processor core 102 through bus 103.
If by the instruction type in the tracing point content that bus 117 reads, to show this instruction be branch target is the unconditional direct descendant instruction of BN1, then controller controlled selector 118 selects this BN1 to be sent to register 112 as output, and upgrade register 112 when stairstep signal 111 is effective, make the value of next period register 112 be updated to this BN1, namely read pointer 115 points to tracing point corresponding to Branch Target Instruction and from level cache 104, reads this Branch Target Instruction and performs for processor core 102 through bus 103.If stairstep signal 111 is invalid, the value of register 112 remains unchanged, and continues to wait for upgrade when stairstep signal 111 effectively again.
Direct descendant's instruction of having ready conditions of to be branch target be if the instruction type in described tracing point content shows this instruction BN1, then the TAKEN signal 113 whether the expression branch transition produced when controller controlled selector 118 performs this branch instruction according to processor core occurs is selected.Now, if the value of TAKEN signal 113 is ' 1 ', represent that branch transition occurs, the BN1 selecting track table to export sends register 112 back to, register 112 is upgraded when stairstep signal 111 effectively (is worth for ' 1 '), make the value of next period register 112 be updated to this BN1, namely read pointer 115 points to tracing point corresponding to Branch Target Instruction and from level cache 104, reads this Branch Target Instruction and performs for processor core 102.If the value of TAKEN signal 113 is ' 0 ', represent that branch transition does not occur, then the result of the BN1X of mask register 112 output and the BN1Y value increasing one of incrementer 114 pairs of registers 112 sends register 112 back to as output, register 112 is upgraded when stairstep signal 111 effectively (is worth for ' 1 '), make the value of next period register 112 increase one, namely read pointer 115 points to the right next tracing point of order and from level cache 104, reads corresponding instruction and performs for processor core 102 through bus 103.If stairstep signal 111 is invalid, the value of register 112 remains unchanged, and continues to wait for upgrade when stairstep signal 111 effectively again.
Direct descendant's instruction (including condition, unconditional two kinds of situations) of to be branch target be if the instruction type in described tracing point content shows this instruction BN2, then this BN2 is sent to block address mapping block 134 through bus 117 and carries out coupling conversion by described method before.In block address mapping block 134, if there is effective BN1X corresponding to this BN2, then export this BN1X and corresponding BN1Y merges into BN1, write back in this take-off point to replace the BN2 of former storage through bus 125.If there is not effective BN1X that this BN2 is corresponding, then produce a BN1X by preceding method by replacement module 124, in track table 110 (and level cache 104), specify an available tracks (and correspondence storage block) through bus 127.And be filled in the storage block that level cache 104 points to by described BN1X by L2 cache 108 from the corresponding first-level instruction block secondary instruction block corresponding to described BN2, simultaneously be filled instruction by scanner 106 and examine and extract tracing point information to described and be filled into the row pointed to by described BN1X in track table 110, and the corresponding relation between BN1X and BN2X of generation is stored in block address mapping block 134.Meanwhile, the instruction corresponding to this track is stored in level cache 104.The BN1X that replacement module 124 produces, and removes a high position (the sub-cache blocks number of L2 cache, the capacity that the sub-buffer memory of each L2 cache is determined the is equivalent to a level cache block) BN1Y that obtains and together merges into BN1 and put bus 125 according to BN2Y.Now, selector switch 140,142 selects the value (take-off point that namely branch instruction itself is corresponding) of read pointer 115 as write address, and selector switch 146 selects the BN1 in bus 125 to write back in this take-off point as write content to replace original BN2.So, that the tracing point content that track table 110 exports comprises is BN1.Operation afterwards and above-mentioned branch target are that the situation in direct descendant's instruction of BN1 is identical, do not repeat them here.
If it is indirect branch instruction (including condition, unconditional two kinds of situations) that the instruction type in described tracing point content shows this instruction, then tracking device suspends renewal, wait for that processor core 102 produces branch address when performing this branch instruction, or calculated by special module.Controller controls the block address in this branch target address to be sent to initiatively table 130 to mate through bus 155, selector switch 148, bus 149 when seeing signal (branch sent as processor core 102 judges useful signal) that branch address has produced.If in active table 130, the match is successful, then the BN2X that the list item that can obtain that the match is successful is corresponding, and using the block bias internal amount in branch target address as BN2Y.This BN2X and BN2Y value is sent to block address mapping block 134 and mates, and as hit obtains corresponding BN1 value, then the operation after and above-mentioned branch target are that the situation in direct descendant's instruction of BN1 is identical; If do not hit, then the operation after and above-mentioned branch target are that the situation in direct descendant's instruction of BN2 is identical, do not repeat them here.If mate unsuccessful in active table 130, represent that the instruction that this branch target address is corresponding is not yet stored in L2 cache 108, then according to the block BN2X replacing algorithm (as lru algorithm) and to be distributed by active table 130 a secondary storage block, and the storer this branch target address being sent to lower level time is fetched command adapted thereto block and is stored in the storage block that L2 cache 108 points to by described BN2X.Press described method before again, this instruction block to be filled in level cache 104 and to set up respective rail, and described BN2 is converted into BN1 and fills out back in this take-off point that (BN2 produced in the process can't be filled in track table 110, and directly the BN1 of correspondence is filled in track table 110), that the tracing point content that track table 110 is exported comprises is BN1.Operation afterwards and above-mentioned branch target are that the situation in direct descendant's instruction of BN1 is identical, do not repeat them here.
If when tracking thinks highly of the new list item read containing this indirect branch target next time, the instruction type of this list item is indirect branch instruction, but address style is BN1, controller assert that this indirect branch instruction was accessed before this accordingly, when instruction type is unconditional branch, or conditional branching and according to processor core 102 feed back branch judge 113 for during branch can with this BN1 address guess execution.But to examine (verify) this conjecture address BN1.Its method can be through BN1 address reverse and go out corresponding instruction address (as: by the BN2X that stores in the track that this BN1X is corresponding to active table 130 addressing sense order block address, and according to the position that this BN1X stores in the row that BN2X described in block address mapping block 134 is corresponding, BN1Y is converted to BN2Y, instruction block address and BN2Y are spliced and obtain complete instruction address), pending device core 102 performs when this indirect branch instruction produces branch target address and the instruction address that this branch target address and reverse go out is compared.If identical, then continue to perform.If not identical, then empty the instruction after take-off point, do not preserve its result, to perform from the branch target address that processor core 102 provides and by this address as precedent is mapped to after BN1 stored in this take-off point.
In the present embodiment, end tracing point is considered as a unconditional branch point, therefore when tracking device 120 read pointer points to that tracing point (the last item instruction namely in instruction block) terminated before tracing point, and this tracing point is not take-off point, or branch transition do not occur take-off point time, tracking device 120 read pointer 115 continues to upgrade, move to end tracing point, and exports BN1 and be sent to level cache 104.Real instruction is not corresponded to owing to terminating tracing point, tracking device 120 read pointer 115 will arrive first tracing point that the next clock period just can be updated to next track, therefore within this clock period, level cache 104 also needs to export a dummy instruction (namely can not change the instruction of processor core internal state, such as NOP) for performing to processor core 102.In the present invention, can the addressable address delivering to level cache 104 be judged, once find that addressable address correspondence terminates tracing point, then do not need to access level cache 104, directly export dummy instruction and perform for processor core 102; Also can increase a storage unit for storing this dummy instruction in the often row of level cache 104, and the BN1 of the end tracing point that this storage unit just in time can be exported by read pointer 115 itself is addressed to, thus exports this dummy instruction confession processor core 102 and perform.But the shortcoming done like this makes processor core 102 spend more a clock period of expense for performing useless dummy instruction at each instruction block.Therefore, can improve Figure 1A, when making tracking device 120 read pointer 115 point to the last tracing point terminating tracing point, perform the feedback of this instruction according to the instruction type of this tracing point and processor core 102, directly point to first tracing point of branch target tracing point or next track at following clock cycle.
Please refer to Figure 1B, it is another embodiment of buffer structure of the present invention.Processor core 102 in the present embodiment, level cache 104, scanner 106, L2 cache 108, replacement module 124, initiatively table 130, block address mapping block 134 are all identical with Figure 1A embodiment with selector switch 132,140,142,146,148,150.Difference is, track table 110 exports the content (the tracing point content 182 that tracking device 120 read pointer 115 points to and subsequent a tracing point content 183) of two tracing points at every turn, type code translator 152, controller 154 and selector switch 116 is then added in tracking device 120, in addition, processor core 102 also additionally sends a signal 161 to the controller 154 in tracking device 120.Its middle controller 154 performs the similar functions of the controller do not shown in Figure 1A, is shown so that more complicated function and operation are described herein.
In the present embodiment, under the addressing of the read pointer 115 that the read port of track table 110 exports at tracking device 120, export the content of two adjacent track points and put bus 117 and bus 121, controller 154 detects the instruction type in described bus 117, and type code translator 152 detects the instruction type in described bus 121.At any one time, from track table 110, two list items are read: current entry 182 and next (right) list item 183 of order thereof.Content in current entry 182 reads an input and controller 154 of being sent to selector switch 118 through bus 117, and when it is BN2 form as being front sent to block address mapping block 134 etc., for the BN2 in content is mapped as BN1.Next list item 183 is sent through bus 121, is sent to type code translator 152 decoding, its output control selector switch 116.An input of selector switch 116 derives from bus 121, and another inputs the BN1Y (the BN1Y value increasing one namely in read pointer 115) after the increasing one that the BN1X that derives from read pointer 115 and incrementer 114 send here.Type code translator 152 is only to the decoding of unconditional branch instructions type, if the type in bus 121 is unconditional branch instructions type, then controlled selector 116 selects the content on output bus 121; If any other type, then select to derive from the BN1Y after the increasing one that the BN1X of bus 115 and incrementer 114 export.
Below first consider that the type (i.e. the next list item of order) in bus 121 is not unconditional branch instructions type.Now, selector switch 116 selects the output from incrementer 114 to be sent to an input of selector switch 118.
If the instruction type that controller 154 translates (content namely in current entry 182) in bus 117 is non-branch instruction, this controller controlled selector 118 selects the input of output as register 112 of the incrementer 114 selected by selector switch 116.Stairstep signal 111 from processor core 102 controls this input stored in register 112, tracking device is moved right and reaches next address (the address BNX1 that namely order is larger is constant, BNY1+ ' 1 ').
If the instruction type in bus 117 in this content is unconditional branch, then controller 154 controlled selector 118 selects the branch target address in bus 117, and read pointer 115 is jumped to by tracing point position corresponding to bus 117 top set destination address.
If the instruction type in bus 117 is the branch that directly has ready conditions, then controller 154 control tracking device 120 suspend upgrade and wait for, until processor core 102 produces the TAKEN signal 113 whether branch transition occurs.Now register 112 not only controls by stairstep signal 111, also one that is subject to processing the generation of device core represents that the whether effective signal 161 of Taken signal 113 controls, need signal 161 show TAKEN signal 113 effectively and stairstep signal 111 also effectively time, register 112 just upgrades.If branch transition does not occur (TAKEN signal 113 is ' 0 '), then the output of selector switch 116 selected by selector switch 118, and the mode as performed non-branch instruction is before run; If there is (TAKEN signal 113 is ' 1 ') in branch transition, then bus 117 selected by selector switch 118, by the branch target address on it stored in register 112, pointer 115 points to the corresponding list item of branch target in track table, and the Branch Target Instruction in level cache 104, read and performed for processor core 102.
If the instruction type in bus 117 is BN2 branch pattern, then controller 154 to control in tracking device 120 register 112 and suspends and upgrade and wait for, and by this BN2 through bus 117, selector switch 132, bus 133 selects a line in block address mapping block 134, obtains BN1 address to change.And provide this BN1 by it, the original indirect branch list item in track table is write through bus 125, selector switch 146, bus 147.This list item reads through bus 117, and this aftertreatment is identical with precedent.The instruction execution result (as: execution result of branch instruction) that tracking device 120 feeds back along this BN1 and according to processor core 102, controls level cache 104 to processor core 102 output order for performing.
If branch transition does not occur, then the way as non-branch instruction is before run, if branch transition occurs, then the way as unconditional branch instructions is before run.
If the instruction type in this content is indirect branch, controller 154 controls register 112 in tracking device 120 and suspends renewal, and waits for that processor core 102 sends branch target address through bus 155.This address is sent to initiatively table 130 through selector switch 148 and is mated.If obtain coupling in active table 130, produce corresponding BN2, then by this BN2 through selecting 132, bus 133 selects a line in block address mapping block 134, obtains BN1 address to change, operation later and upper example with.If fail coupling in active table 130, be sent to lower level storer acquisition command adapted thereto block by this branch target address and be packed into L2 cache 108, and the first-level instruction block of needs is packed into level cache 104.Filled level cache block BN1 is inserted block address mapping block 134, and this BN1 is sent by through bus 125, and operation is same with upper example later.
If be unconditional branch instructions in list item 183, instruction type decoding then in branch pattern code translator 152 pairs of buses 121, make selector switch 116 select the branch target in bus 121 and BN1 (described BN1 and BN1X that provide through incrementer 114 is not provided, BN1Y+ ' 1 '), so after processor core 102 executes the corresponding instruction of list item 182, do not perform the instruction of list item 183 correspondence (because list item 183 correspondence may be terminate tracing point, instruction is there is no corresponding with it) in level cache 104, but directly perform the command adapted thereto of branch target address contained in list item 183.
If be a non-branch instruction in list item 182, then next instruction performed after executing this instruction is as mentioned above exactly the instruction pointed by branch target in list item 183.If be a unconditional branch instructions in list item 182, then next instruction performed after executing this instruction is exactly the instruction pointed by branch target in list item 182, and list item 183 does not have an impact to this process.If be a conditional branch instructions in list item 182, then the TAKEN signal 113 that processor core 102 produces is depended in next instruction performed after executing this instruction.As being judged as, branch transition occurs (TAKEN signal 113 is ' 1 '), then the branch target in bus 117 selected by selector switch 118, represent that the effective signal 161 of TAKEN signal 113 controls this target stored in register 112, make pointer 115 point to this branch target, next instruction performed is exactly the instruction in list item 182 pointed by branch target address.As being judged as, branch transition does not occur (TAKEN signal 113 is ' 0 '), then the branch target in the bus 121 of selector switch 116 output selected by selector switch 118, represent that the effective signal 161 of TAKEN signal 113 controls the unconditional branch target from 183 to make pointer 115 point to this branch target stored in register 112 with stairstep signal 111, next instruction performed is exactly the instruction pointed by unconditional branch destination address in list item 183.
Its address of unconditional branch target of terminating in tracing point also can be L2 cache address BN2.If find that this address is BN2 form during the instruction type of the list item that type code translator 152 reads in decoding bus 121, also the BN2 that this bus 121 exports can be put bus 117 and in block address mapping block 134, be mapped as BN1 by precedent and deposit back this list item.In order to clear and be convenient to illustrate for the purpose of, this path does not draw in fig. ib.
In Figure 1B example, the type of this conditional branch instructions judges there are four kinds of modes.First kind of way is for only having a kind of unconditional branch type, and namely to unconditional branch instructions original in program, in the end tracing point added with the present invention, control skip operates to the unconditional jump of the initial list item of next track and do not add differentiation.The conditional branch instructions that this mode can make calling program Central Plains have is skipped, and is not performed by processor core 102, but program flow is under the control of track table 110 with tracking device 120, correctly can perform target instruction target word and the subsequent instructions thereof of this branch instruction.Like this, the clock period originally performed shared by this unconditional branch instructions is saved.But because do not perform this instruction in processor core 102, PC value of program counter has error, if need to keep accurate PC value, needs to compensate.The instruction that caching system in the present invention does not need PC correctly to provide it to perform to processor core 102 performs incessantly for it.If when needing to obtain PC value sometime (during as debugging), often all describe the corresponding L2 cache block address BN2X of this first-level instruction block and L2 cache subblock address in row track table.Thus, BN2X can read corresponding label from active table 130, and with L2 cache block address, in subblock address and pointer 115, the numerical value splicing of BNY, is exactly the PC value of the instruction performed.
The second way is for there being two kinds of unconditional branch types.Wherein, a kind of is the end point of every bar track in end point unconditional branch type respective carter.For this end point unconditional branch type, type code translator 152 is regarded as an instruction in the not corresponding program of this end point, controlled selector 106 selects the branch target in bus 121 thus, directly jumps to the branch target address in bus 121 after executing the instruction in bus 117.Unconditional branch type in another kind of corresponding program, type code translator 152 not it can be used as branch process when translating this type, and controlled selector 116 selects the output of incrementer 114.When after the command adapted thereto executing the contents in table in bus 117, next instruction performed is its order next instruction, i.e. original unconditional branch instructions in program.PC under this mode in processor core then keeps correct value always.
The third mode is for improve Figure 1B embodiment, in the process of scanner 106 pairs of instruction block examinations, if the second from the bottom instruction finding first-level instruction block is not branch instruction of having ready conditions, and the last item instruction is non-branch instruction, end tracing point is merged in tracing point corresponding to this last item instruction by scanner in this case.Namely, the instruction type of this last item instruction is labeled as unconditional branch instructions, and BN1 or BN2 (if BN2 can be converted into BN1 by precedent when then tracking device reads) corresponding for the instruction of next instruction block Article 1 is stored in tracing point corresponding to this last item instruction as tracing point content.Like this, when tracking device 120 read pointer 115 points to tracing point corresponding to this instruction, except reading this instruction for except processor core 102 normally execution from level cache 104, instruction type decoding in bus 117 is found it is unconditional branch type by controller 154, therefore controlled selector 118 selects bus 117, at following clock cycle, read pointer 115 is updated to the branch target BN1 (BN1 that namely instruction of next instruction block Article 1 is corresponding) of this unconditional branch.Now, processor core 102 does not need a clock period of waste to perform dummy instruction.
In the process of scanner 106 pairs of instruction block examinations, if the last item instruction (in a corresponding track last tracing point) finding first-level instruction block is branch instruction, end tracing point is not merged in tracing point corresponding to this instruction by scanner in this case, and will terminate the Content placement of the tracing point tracing point of (right) after the tracing point that the instruction of every bar track the last item is corresponding.When this last item instruction is unconditional branch instructions, controller 154 selects the branch target in bus 117 to put pointer 115 by the unconditional branch Type Control selector switch 118 in bus 117, jumps to this target, and terminating tracing point can not be performed.When this last item instruction is conditional branch instructions, controller 154 suspends by the conditional branching Type Control tracking device 120 in bus 117, waits for that the branch that processor core 102 produces judges signal 113.Now type code translator 152 translates the instruction type in bus 121 is unconditional branch, and controlled selector 116 selects bus 121.When branch judges that signal 113 is as ' branch ', controller 154 controlled selector 118 selects the conditional branching target in bus 117 to put pointer 115.When branch judges that signal 113 is as ' not branch ', controller 154 controlled selector 118 selects the output of 116 selector switchs, and the unconditional branch target in bus 121 is put pointer 115.Level cache 104 is sent instruction by pointer 115 and is performed for processor core 102.
Above-mentioned three kinds of modes be all both applicable to the instruction of fixed length or elongated instruction.Namely do not require to terminate the fixed position of tracing point in track.In addition, if it is fixing for terminating the position of tracing point in track, then can judge whether to arrive the last item instruction according to the value of BN1Y in read pointer 115.4th kind of mode is only have a kind of unconditional branch type in track table, but tracking device is divided into two types according to the type present position in track.In this mode, the BN1Y in pointer 115 is sent to type code translator 152 and instruction type in bus 121 does not need decoding.When described BN1Y points to last list item in a track, type code translator 152 controlled selector 106 selects the branch target in bus 121, directly jumps to the branch target address in bus 121 after executing the instruction in bus 117.When described BN1Y points to other list items in a track except last list item, type code translator 152 controlled selector 116 selects the output of incrementer 114.When after the command adapted thereto executing the contents in table in bus 117, next instruction performed is its next instruction of order.PC under this mode in processor core then keeps correct value always.This mode adapts to fixed length instructions.
In addition, when track table 110 list item read from bus 117 is conditional branch instructions through its type of control module 154 decoding, the present invention can perform (speculate execution), to improve the execution efficiency of processor along the conjecture of in branch by control processor core 102.Refer to Fig. 1 C, its for the present invention support guess perform embodiment.Add selector switch 162 and register 164 in tracking device 120 compared with tracking device in Figure 1B in Fig. 1 C, perform unchecked another for selecting, storing branch's conjecture and keep in, use in order to during conjecture mistake.Conjecture performs direction can by existing static prediction, or dynamic branch predictor (branchprediction) technology determines, also can be determined by the branch prediction territory be stored in the list item of respective branches instruction in track table.And input selector 118 is also replaced by three input selectors 218, the output of register 164 connects the 3rd input of selector switch 218.
Do not branch into example with conjecture, controller 154 is when translating in bus 177 conditional branching type and obtaining unbranched predicted value, and controlled selector 162 and register 164 select branch target address in bus 117 stored in register 164.The output of 116 selector switchs (it is next instruction of order of branch instruction) is selected to supply stored in register 112 with Time Controller 154 controlled selector 118, make pointer 115 control level cache 104 branch instruction is provided after next instruction of order perform for processor core 102, and mark this instruction for conjecture to processor core and perform.Pointer 115 also to point in track table 110 order first list item after branch instruction, makes it be put bus 117.Controller 154 determines the follow-up direction of tracking device by the instruction type in bus 117 afterwards, continues to provide instruction to processor core.All these instructions are all marked as conjecture and perform.When bus 161 notifies that branch judges that signal 113 is as time effective, the branch direction of prediction compares with the branch direction on 113 by controller 154.If comparative result is identical, then continue to perform along former conjecture direction.If comparative result is different, this Time Controller 154 sends the signal of ' conjecture mistake ' to processor core 102, makes processor core remove the instruction of all band conjecture execution flags and middle execution result thereof.With the output of Time Controller 154 controlled selector 118 mask register 164, make branch be not used to control level cache device 104 by the address of that conjecture performs and provide instruction to processor core 102, and continue to perform along this.
If conjecture is branch, then controller 154 is when translating in bus 177 conditional branching type and the predicted value of branch is carried out in acquisition, and controlled selector 162 and register 164 select the output of 116 selector switchs (it is next instruction of order of branch instruction) stored in register 164.The branch target address in bus 117 is selected to supply stored in register 112 with Time Controller 154 controlled selector 118, making pointer 115 control level cache 104 provides the Branch Target Instruction of branch instruction to perform for processor core 102, and marks this instruction for conjecture execution to processor core.Pointer 115 also points to list item in the track table 110 of the branch target address sensing in bus 117, makes it be put bus 117.Controller 154 determines the follow-up direction of tracking device by the instruction type in bus 117 afterwards, continues to provide instruction to processor core.All these instructions are all marked as conjecture and perform.When bus 161 notifies that branch judges that signal 113 is as time effective, the branch direction of prediction compares with the branch direction on 113 by controller 154.If comparative result is identical, then continue to perform along former conjecture direction.If comparative result is different, this Time Controller 154 sends the signal of ' conjecture mistake ' to processor core 102, makes processor core remove the instruction of all band conjecture execution flags and middle execution result thereof.With the output of Time Controller 154 controlled selector 218 mask register 164, make branch be not used to control level cache device 104 by the address of that conjecture performs and provide instruction to processor core 102, and continue to perform along this.
In Figure 1A and Figure 1B embodiment, in level cache 104, the instruction block at the Branch Target Instruction place of the instruction of all direct descendants and most branch instruction is all prefetched in L2 cache 108 in advance, therefore can not cause the decline of processor system performance because of L2 cache disappearance.In addition, when the take-off point content that the branch instruction in level cache 104 is corresponding comprise be BN1 time, the instruction block at its Branch Target Instruction place has been stored in level cache 104, can not cause the decline of processor system performance because level cache disappearance; But, if this take-off point content comprises is BN2, then still can there is level cache disappearance.Figure 1A embodiment can be improved for this reason, make tracking device 120 read pointer 115 earlier can point to take-off point, in advance instruction block is filled into level cache 104 from L2 cache 108, and BN2 is converted to BN1.In addition, also can do similar improvement for Figure 1B, Fig. 1 C, repeat no more in this manual.
Please refer to Fig. 2, it is another embodiment of caching system of the present invention.In the present embodiment, identical with Figure 1A embodiment of processor core 102, level cache 104, scanner 106, L2 cache 108, track table 110, replacement module 124, initiatively table 130, block address mapping block 134 and selector switch 132,140,142,146,148,150 and corresponding controllers.Difference is that register 112 is no longer controlled by stairstep signal 111, and by branch signal 161 and controller to the type information decoding co-controlling in bus 117.Add the level cache addressable address BN1 that first in first out buffering (FIFO) 226 exports for storing tracking device in addition in the present embodiment.The write of buffering 226 controls by the control signal of register 112, and namely register 112 deposits into a new value at every turn, and this value is also written into 226 subsequently.The reading of buffering 226 controls by stairstep signal 111, exports BN1 stored therein carry out addressing and perform to obtain command adapted thereto confession processor core 102 by the order of first-in first-out to first-level buffer.
When the type information that controller translates in bus 117 is non-branch pattern, it makes selector switch 118 select the output of incrementer 114 and makes register 112 deposit into the output of this incrementer, through pointer 115 point to and read same track in track table 110 capable in order next list item.When the type information that controller translates in bus 117 is unconditional branch type, it makes selector switch 118 select bus 117 and makes register 112 deposit into address in bus 117, performs and read branch target list item through pointer 115.When the type information translated in bus 117 as controller is conditional branching type, it makes selector switch 118 and register 112 all be controlled by processor core 102.Branch from processor core 102 judge signal (TAKEN) 113 as ' 0 ' time, the output of incrementer 114 selected by selector switch 118; When branch judge signal 113 as ' 1 ' time, bus 117 selected by selector switch 118.When the branch's useful signal 161 from processor core is effective, register 112 deposits into the output of selector switch 118, pointer 115 is pointed to by the judged result in processor core 102 and reads the corresponding list item of order next instruction of this branch instruction in track table 110, or the corresponding list item of branch target.Point to select incrementer 114 output and make register 112 deposit into the output of this incrementer, through pointer 115 point to and read same track capable in order next list item.Also can not use branch's useful signal 161 and its function is merged on stairstep signal 111, namely when processor core 102 performs branch instruction but not yet make branch and judge to make stairstep signal 111 as ' 0 ', register 112 being suspended and upgrades.When branch judges that having made branch judges that signal 113 effectively, make stairstep signal 111 for ' 1 ', register 112 recovers to upgrade, and reaches and the aforementioned effect using branch's useful signal 161 same.
When the type information that controller translates in bus 117 is BN2 type, then register 112 does not upgrade, tracking device pointer 115 is made to keep pointing to this list item, by precedent by BN2 by bus 117, selector switch 132 and bus 133 are sent to block address mapping block 134 and obtain corresponding BN1 address to map, and write back the same list item of track table 110 pointer 115 sensing through bus 123, selector switch 146 and bus 147.After this this BN1 is read through bus 117 and run according to precedent according to the instruction type controlled selector 118 of this BN1 and register 112 by controller.When the type information that controller translates in bus 117 is indirect instruction type, then register 112 does not upgrade, make tracking device pointer 115 keep pointing to this list item, the indirect branch target instruction address 155 processor core 102 being produced by example in Figure 1A is sent to initiatively table 130 through selector switch 148 and mates.Coupling gained BN2 via selector switch 132, bus 133 and deliver to block address mapping block 134 be mapped as BN1 through bus 123, selector switch 146 and bus 147 write back track table 110 pointer 115 point to same list item.After this this BN1 is read through bus 117 and run according to precedent according to the instruction type controlled selector 118 of this BN1 and register 112 by controller.
So, because there is the buffering of buffering 226, tracking device 120 read pointer 115 just can shift to an earlier date some instructions after the instruction that directional processors core 102 performing.When having take-off point content to comprise the branch instruction of BN2 in these some instructions, then can read this BN2 through the take-off point that this branch instruction is corresponding by bus 117 at read pointer 115, and utilize tracking device 120 read pointer 115 to point to BN1 that this take-off point and buffering 226 export this take-off point to level cache 104 obtain instruction confession processor core 102 perform between mistiming, by method described in embodiment before, instruction block is filled into level cache 104 from L2 cache 108, and BN2 is converted to BN1 and writes back this take-off point.Like this, when processor core 102 performs this branch instruction, its Branch Target Instruction place instruction block has been stored in level cache 104, cache miss can not occur.
In addition, can also on the basis of Figure 1A embodiment, increase a secondary tracking device (slave tracker), tracking device is made to provide addressable address to perform for processor core 102 to obtain instruction to level cache 104 as the tracking device in Figure 1A embodiment, and secondary tracking device arrives several tracing points after the instruction that performing of processor core 102 in advance as the tracking device in Fig. 2 embodiment, the Branch Target Instruction place instruction block being arranged in L2 cache 108 in these instructions is filled into level cache 104.In addition, also can do similar improvement for Figure 1B, repeat no more in this manual.Please refer to Fig. 3, it is another embodiment of caching system of the present invention.
In figure 3, in the present embodiment, identical with Figure 1A embodiment of processor core 102, level cache 104, scanner 106, L2 cache 108, track table 110, replacement module 124, initiatively table 130, block address mapping block 134 and selector switch 132,140,142,146,148,150 and corresponding controllers.Difference is, track table 110 in the present embodiment has two groups of read ports, can export corresponding tracing point content respectively according to two read pointers simultaneously.In addition, a secondary tracking device 320 is added in the present embodiment.The structure of described secondary tracking device 320 is similar to tracking device 120, is made up of register 312, incrementer 314 and selector switch 318, and its read pointer 315 exported can independently to the addressing of track table 110.Like this, track table 110, according to the addressing of tracking device 120 read pointer 115, exports corresponding tracing point content from bus 117, and according to the addressing of secondary tracking device 320 read pointer 315, exports corresponding tracing point content from bus 317.
In the present embodiment, the read pointer 115 of tracking device 120 comprises BN1X and BN1Y, and its operating process is identical with the tracking device in Figure 1A embodiment, does not repeat them here.And the read pointer of secondary tracking device 320 only includes BN1Y, the tracking device in its operating process and Fig. 2 embodiment is similar.Particularly, the register 312 each clock period in secondary tracking device 320 all upgrades.When TAKEN signal 113 value that processor core 102 is sent here is ' 0 ', what represent the execution of current processor core 102 is not branch instruction, or perform branch instruction but branch transition and do not occur, then selector switch 318 BN1X that selects to derive from register 312 and derive from incrementer 314 increasing one after BN1Y upgrade register 312 as new BN1, read pointer 315 is made to point to the next tracing point of current orbit in track table 110, so repeatedly, till last tracing point pointing to this track.In the process, the tracing point content of read pointer 315 process is all read out, if find, this tracing point is the take-off point comprising BN2, then through bus 317, described BN2 is sent to block address mapping block 134 and L2 cache 108, deposit in case by method described in embodiment before at effective BN1X, BN2 being converted to BN1 is backfilling in this take-off point, or a BN1X is distributed by replacement module 124 in the non-existent situation of effective BN1X, and BN2Y is converted to BN1Y thus obtains complete BN1, and the instruction block of correspondence is filled into level cache 104 from L2 cache 108, described BN1 is backfilling in this take-off point.Like this, along with read pointer 315 is along rail moving to end tracing point, in this track, the Branch Target Instruction place instruction block of all take-off points was all just filled in level cache 104 before processor core 102 performs this branch instruction, cache miss can not occur.
As before as described in embodiment, tracking device 120 read pointer 115 controls to provide addressable address to perform for processor core 102 to obtain instruction to level cache 104 by stairstep signal 111.Therefore, when processor core 102 performs branch instruction, the value of tracking device 120 read pointer 115 is exactly the addressable address BN1 of this branch instruction respective branches point, reads this take-off point content and export through bus 117 to be sent to the selector switch 118 in tracking device 120 and the selector switch 318 in secondary tracking device 320 to the addressing of track table 110 simultaneously.Now work under the control of stairstep signal 111 and TAKEN signal 113.
If branch transition occurs when processor core 102 performs branch instruction, then selector switch 118 selects the BN1 in bus 117 to upgrade register 112, read pointer 115 is made to point to branch target tracing point, and provide from the BN1 of this branch target tracing point own to level cache 104, perform for processor core 102 to obtain described Branch Target Instruction.And selector switch 318 also selects the BN1 in bus 117 to upgrade register 312, read pointer 315 is made also to point to branch target tracing point.Tracking device 120 provides the subsequent instructions of described Branch Target Instruction to processor core 102 by preceding method continuation control level cache 104 afterwards, and secondary tracking device 320 presses preceding method at the upper mobile read pointer 315 of this track (i.e. described branch target tracing point place track), guarantee that the Branch Target Instruction place instruction block of the take-off point of its process is filled in level cache 104.
If branch transition does not occur when processor core 102 performs branch instruction, BN1Y after then the increasing one of incrementer 114 is selected to derive from BN1X in register 112 and derived to selector switch 118 upgrades register 112 as new BN1, read pointer 115 is made to point to a rear tracing point of this take-off point, and provide from this rear BN1 of tracing point own to level cache 104, perform for processor core 102 to obtain command adapted thereto.And selector switch 318 also selects to derive from the BN1X in register 312 and BN1Y after deriving from the increasing one of incrementer 314 upgrades register 312 as new BN1, namely continue mobile read pointer 315 on current orbit, guarantee that the Branch Target Instruction place instruction block of the take-off point of its process is filled in level cache 104.
Like this, the present embodiment achieves the function identical with Fig. 2 embodiment, and when making processor core 102 perform branch instruction, its Branch Target Instruction place instruction block has been stored in level cache 104, cache miss can not occur.
In the above-described embodiments, except terminating tracing point, the list item number of often going in track table equals the number of instructions in corresponding first-level instruction block.Need owing to only having branch instruction to store branch target addressable address (BN1 or BN2), but not only have instruction type to be necessary in tracing point content corresponding to branch instruction, make to store a large amount of gibberish in track table.Therefore can compress track table 110, to save storage space.Please refer to Fig. 4, it is another embodiment of caching system of the present invention.The present embodiment, based on Figure 1A embodiment, has been described in detail the inner structure of track table 110-.For convenience of description, illustrate only part of module in the diagram.Wherein, level cache 104, processor core 102 are identical with the corresponding component in Figure 1A embodiment with tracking device 120.The structure of secondary tracking device 420 is similar to the tracking device 120 in Figure 1A embodiment, but its read pointer exported is column pointer 425, contains only column address or row number (MBNY), and in secondary tracking device 420, selector switch 418 and register 432 accept the different of control signal and tracking device 120.
In the present embodiment, track table 110 is made up of an instruction type table 410 and a destination address table 412, between the two and with the row one_to_one corresponding of level cache 104, the same BN1X sent here by bus 411 points to.Last of instruction type table 410 is classified as end list item, and the columns of all the other list items is equal with the number of instruction in first-level instruction block, and one_to_one corresponding.In instruction type table 410, in each list item except terminating list item, all store the fundamental type of corresponding instruction in level cache 104, terminate then to store a branch instruction type in list item.Like this, the fundamental type information of each instruction in corresponding memory block in track table 110 is just contained in instruction type table 410.
The columns of destination address table 412 is more than or equal to the maximum number of the branch instruction unconditional branch of tracing point (comprise terminate) that may exist in a first-level instruction block, the BN (can be BN1 or BN2) that the Branch Target Instruction that its order often occurred by branch instruction in corresponding first-level instruction block in row stores respective branch instructions successively in each list item is from left to right corresponding.For convenience of explanation, the list item having stored correct target instruction target word BN in destination address table 412 is called effective list item, all the other list items are invalid list item.Especially, the BN (being equivalent to the BN terminating to store in tracing point) that this row in corresponding instruction type table 410 terminates target trajectory point corresponding to list item is stored in often going in last effective list item.Like this, just contain the addressable address BN of branch target tracing points all in track table 110 in destination address table 412, and in destination address table 412 corresponding to any track, at least comprise an effective list item (namely at least store to comprise and terminate target instruction target word BN corresponding to tracing point).
Please refer to Fig. 5, it is an embodiment of instruction type table of the present invention and destination address table.In the present embodiment, 4 instruction blocks (instruction block 0 ~ instruction block 3) are comprised with level cache 104, each instruction block comprises 8 instructions (instruction 0 ~ instruction 7) for example and is described, and track table 110 correspondingly also comprises 4 row (4 track).Therefore the instruction type table 410 in track table 110 comprises 4 row (the 0th row ~ the 3rd row), the often corresponding instruction block of row, comprise 9 list items, 8 articles wherein in front 8 list items (the 0th list item ~ the 7th list item) corresponding instruction block, store the instruction type of command adapted thereto.In the present embodiment, represent branch instruction type with ' 1 ', represent non-branch instruction type with ' 0 '.Last list item (the 8th list item), for terminating list item, must be a branch instruction type.
Destination address table 412 in track table 110 also comprises 4 row (the 0th row ~ the 3rd row), often row comprises 3 list items, for storing the BN of Branch Target Instruction, and type information (as: direct descendant, indirect branch, the branch that has ready conditions, unconditional branch, branch target are BN1 or BN2 etc.) specifically.At this, suppose to comprise at most 2 branch instructions in each instruction block, then add and terminate unconditional branch transfer corresponding to list item, therefore often row needs at most 3 list items for storing the BN of Branch Target Instruction.Instruction block is comprised to the situation of more multiple-limb instruction, can the columns of corresponding increase destination address table 412, do not repeat them here.In addition, described type information is specifically stored in instruction type table 410 and is also fine, but the capacity of the instruction type table 410 having more multilist item can be increased, being therefore stored in destination address table 412 but not instruction type table 410, its objective is to save storage space further.
Scanner 106 examine instruction, extract command information and set up track time, the instruction type extracted can be stored in the list item pointed to by BN1X, BN1Y of this instruction itself in instruction type table 410.For branch instruction, then also coupling or distribute to be obtained, in first invalid list item that Branch Target Instruction BN and described branch pattern information is stored in the row that in destination address table 412, this BN1X points to, making it to become effective list item.Particularly, described function can be realized with a counter, when setting up new-track, this counter is cleared, whenever examining to a branch instruction, coupling or distribute is obtained the row that BN1X that Branch Target Instruction BN and described type information be stored in this branch instruction in destination address table 412 points to, and in the list item of the row correspondence pointed to by this counter, and counter increases one.Like this, after a track is set up, in instruction type table 410, store the fundamental type of all tracing points on this track, and in destination address table 412, store Branch Target Instruction and the branch pattern information of all take-off points and end tracing point on this track.
In the present embodiment, contents in table in destination address table 412 is made up of four parts, Part I represents that target instruction target word addressable address is that (' 1 ' represents BN1 to BN1 or BN2, ' 2 ' represents BN2), Part II represents that (' C ' represents and directly to have ready conditions branch, and ' U ' represents direct unconditional branch for the type of this branch instruction.' I ' represents indirect conditional branching), third and fourth part represents BNX and BNY in target instruction target word BN respectively.Such as, in Fig. 5, the 0th contents in table ' 2C83 ' of destination address table 412 the 0th row represents that corresponding branch instruction is branch instruction of having ready conditions, and target instruction target word addressable address is BN2, and wherein BN2X value is ' 8 ', BN2Y value is ' 3 '.
If the addressing type of an instruction is BN2, and instruction type is ' I ', indirect conditional branch instructions, now tracking device 120 should wait for that processor core 102 is sent branch target address through bus 155 and mated to active table 130, as performed for processor core 102 by this BN1 address reading command from level cache 104 behind precedent acquisition BN1 address.If the addressing type of an instruction is BN1, and instruction type is ' I ', indirect conditional branch instructions, now tracking device 120 can point to first-level instruction buffer memory 104 with this BN1 address through pointer 115, therefrom read command adapted thereto, and subsequent instructions is guessed for processor core 102 and is performed, but compared with the BN1 that performs of the BN1 that processor core 102 should to be mated gained through indirect target address that bus 155 is sent through active table 130 and block address mapping block 134 and this conjecture.As comparative result is identical, then processor core continues to perform.As comparative result is not identical, then the instruction and intermediate result thereof of streamline being guessed execution will be removed by processor core 102, and controller controls the BN1 of coupling gained to be write in the list item of track table the BN1 replacing original conjecture and perform.Tracking device reads this BN1 and performs according to precedent afterwards.
Particularly, suppose in track table 110, to have established three articles of tracks: 0th, 1, No. 3 track.As shown in Figure 5, in instruction type table 410, the 8th list item of these three articles of tracks is as end list item, and value is ' 1 '.In all the other list items, No. 0 track the 3rd list item, No. 1 track the 2nd list item and the 6th list item, and the value of No. 3 track the 4th list item is ' 1 ', represent that the tracing point that these list items are corresponding is take-off point, the respective stored BN of Branch Target Instruction in destination address table 412.Such as, No. 0 track the 3rd list item is first take-off point in this track, and the 0th list item of destination address table 412 the 0th row is corresponding with it.The next take-off point of No. 0 track terminates tracing point exactly, and the 1st list item of destination address table 412 the 0th row is corresponding with it.Be regarded as unconditional branch in the present embodiment owing to terminating tracing point, the appropriate section therefore in this contents in table is ' U '.
Get back to Fig. 4, as long as the read pointer 115 of tracking device 120 is mobile along the row of instruction type list 410 in track table 110, just can read the fundamental type that in a track, all tracing points are corresponding successively, the BN1 simultaneously in read pointer 115 is sent to level cache 104 and obtains instruction and perform for processor core 102 through bus 103.According to technical solution of the present invention, before processor core 102 performs branch instruction, read the tracing point content of this branch instruction in advance, if that wherein comprise is BN2, then command adapted thereto block is filled into level cache 104 from L2 cache 108 by bus 105.Therefore, need to point to a take-off point when read pointer 115, or during continuous non-take-off point before this take-off point, namely from destination address table 412, read branch target BN corresponding to this take-off point by bus 423, judgement is that BN1 or BN2 is to carry out subsequent operation (function of bus 423 is equivalent to Figure 1A bus 117).
Particularly when processor core 102 perform instruction there is not branch transition time, the value of register 422 is ' 0 ', namely selector switch 418 select derive from incrementer 414 output (in register 432 destination address table 412 row number increase one).And deliver to through bus 413 the fundamental type information that the BN1Y in the read pointer 115 of storer 426 reads at every turn and all can be stored in register 424.Like this, once read pointer is through fundamental type information corresponding to take-off point (value is ' 1 '), then at following clock cycle, the value of register 424 is ' 1 ', control register 432 writes the output of selector switch 418, make column pointer 425 be updated to former row number and increase one, point to the next list item in destination address table 412.
When processor core 102 performs branch instruction and branch transition occurs, TAKEN signal 113 (value is ' 1 ') is written into register 422, and the type of foundation information that storer 426 exports (value is ' 1 ') is written into register 424.Like this, at following clock cycle, read pointer 115 points to branch target tracing point, is the row MBNY of the list item that first take-off point is corresponding after this tracing point in destination address table 412 and puts bus 419 by offset address mapping block 416 by the BN1Y Mapping and Converting of this tracing point.Now, the value of register 422 is ' 1 ', and controlled selector 418 selects the row number in bus 419.The value of register 424 is ' 1 ', and control register 432 writes the output of selector switch 418, makes column pointer 425 be updated to row number in bus 419, the list item that after pointing to tracing point described in destination address table 412, first take-off point is corresponding.Due to the row one_to_one corresponding of instruction type table 410 and destination address table 412, therefore only need to map row.Offset address mapping block 416 in Fig. 4 embodiment achieves described map operation.
Please refer to Fig. 6, it is an embodiment of offset address mapping block of the present invention.Wherein, in selector switch array 601, the columns of selector switch equals the number of instruction in first-level instruction block, i.e. 8 row; The line number of selector switch to equal in destination address table 412 often row can comprise the maximum number of list item.For clarity, show 4 row, 4 row in figure 6, be respectively bottom-up initial 4 row and initial 4 row from left to right.Next behavior the 0th row, the line number of above each row increases progressively successively.The most left side one is classified as the 0th row, and the row number that its right respectively arranges increase progressively successively, often arrange row in corresponding instruction type table.In 0th row except the input A of the 0th row selector be ' 1 ', input B is that except ' 0 ', input A and B of all the other each selector switchs is ' 0 '.The input B of all selector switchs of the 0th row is ' 0 '.The input A of other column selectors derives from the output valve of the same row selector of previous column, and input B derives from the output valve of previous column next line selector switch.
Code translator 605 carries out decoding to the BN1Y in tracking device 120 read pointer 115 sent here from bus 413, and the mask value obtained is sent to mask device 607.The width of this mask value is also 8, and the value of the masked bits before the masked bits that wherein said BN1Y is corresponding is ' 1 ', and the value of the masked bits that this BN1Y is corresponding and masked bits is afterwards ' 0 '.Afterwards, be that 8 bit instruction types instruction type that in the address a line content that read expel stagnation beam trajectory point corresponding beyond carry out step-by-step and operation with what send here from instruction type table 410 by the BN1X in read pointer 115 by this mask value, thus the value of instruction type before retaining the masked bits that in this row, this BN1Y is corresponding, and its residual value is reset, the control word obtaining 8 is sent to selector switch array 601.
A column selector in each controlled selector array 601 of this control word.When this position is ' 0 ', the selector switch of respective column is all selected to input A; When this position is ' 1 ', the selector switch of respective column is all selected to input B.That is, for each column selector in selector switch array 601, if the control bit of correspondence is ' 1 ', then select the output valve deriving from previous column next line as input, the output valve entirety of previous column is moved up a row, and mends ' 0 ', as the output of these row at most next line; If the control bit of correspondence is ' 0 ', then selects to derive from the output valve of previous column with a line as input, keep the output of output valve as these row of previous column.Like this, in control word, have how many ' 1 ', the input of selector switch array 601 first row will by move how many row, i.e. in the input of selector switch array 601 unique one ' 1 ' by moved corresponding line number.Comprise in the output of therefore selector switch array 601 and only comprise one ' 1 ', and the position of the row at this ' 1 ' place is determined by control word.Afterwards, encoded by the output of scrambler 603 pairs of selector switch arrays 601, the column address MBNY that gained is destination address table 412 sends through bus 419, thus completes the conversion (conversion namely between BNY and MBNY) of column address between instruction type table 410 and destination address table 412.
Such as, if the BN1X of current read pointer 115 is ' 1 ', BN1Y is ' 4 ', then should find the list item (i.e. the 1st row the 1st list items) that first take-off point after this tracing point (BN1X be ' 1 ', BN1Y be ' 4 ') (BN1X be ' 1 ', BN1Y be ' 6 ') is corresponding in destination address table 412.Now, the mask value that mask device 607 exports is ' 11110000 ', and the 1st row value ' 00100010 ' that and instruction type list 410 is sent obtains ' 00100000 ' after carrying out step-by-step and operating, and namely has 1 ' 1 ' in control word.Like this, in the input of selector switch array 601 ' 1 ' by move 1 row, namely the output of selector switch array 601 is followed successively by ' 01000000 ' from bottom to top, encoded device 603 obtains ' 1 ' after encoding and exports through bus 419, makes the column address (BN1Y) ' 4 ' of instruction type table 410 the 1st row be converted to the column address (MBNY) ' 1 ' of destination address table 412 the 1st row.
And for example, if the BN1X of current read pointer 115 is ' 0 ', BN1Y is ' 3 ', now corresponding is exactly a take-off point, then should find the list item (i.e. the 0th row the 0th list items) of this take-off point (BN1X be ' 0 ', BN1Y be ' 3 ') correspondence in destination address table 412.Now, the mask value that mask device 607 exports is ' 11100000 ', and the 0th row value 00010000 ' that and instruction type list 410 is sent obtains ' 00000000 ' after carrying out step-by-step and operating, and namely has 0 ' 1 ' in control word.Like this, in the input of selector switch array 601 ' 1 ' not by move, namely the output of selector switch array 601 is followed successively by ' 10000000 ' from bottom to top, encoded device 603 obtains ' 0 ' after encoding and exports through bus 419, makes the column address (BN1Y) ' 3 ' of instruction type table 410 the 0th row be converted to the column address (MBNY) ' 0 ' of destination address table 412 the 0th row.
Getting back to Fig. 4, in the present embodiment, also add a line storage 426 for storing instruction type table number.When this branch instruction execution result be judged as not branch time, selector switch 418 selects the output of incrementer 414 stored in register 432, and result makes column pointer 425 point to the list item of original list item right in branch target table 412.When this branch instruction execution result is judged as branch, selector switch 418 to select in bus 419 content stored in register 432, the list item of first branch instruction in track table 412 after result makes column pointer 425 point to the branch target mapping gained through offset address mapping block 416.
Instruction fundamental type in tracking device 120 and all interdependent reservoir 426 of secondary tracking device 420 runs.When being ' 0 ' (being non-branch instruction type) on the output bus 421 of storer 426; Tracking device 120 upgrades read pointer 115 with the output of incrementer 114, makes it shift to the next tracing point of order, and controls level cache 104 and export next instruction of respective sequence for processor core 102 and perform; In secondary tracking device 420, register does not upgrade, and column pointer 425 does not move.
When being ' 1 ' (being branch instruction type) on the output bus 421 of storer 426, tracking device 120 judges to upgrade read pointer 115 according to branch pattern and branch, as branch transition does not occur, then read pointer 115 shifts to the next tracing point of order, and controls level cache 104 and export next instruction of respective sequence for processor core 102 and perform; As branch transition occurs, then read pointer 115 jumps to the branch target of the output bus 423 deriving from branch target table 412, and controls the target instruction target word of level cache 104 output branch for processor core 102 and perform.
Register also corresponding renewal in secondary tracking device 420, column pointer 425 moves.As branch transition does not occur, then column pointer 425 shifts to the order next column of branch target table 412, reads the list item pointed to by BN1X address 411 in read pointer 115 in these row, is sent to selector switch 118 in tracking device 120 for subsequent use through bus 423.As branch transition occurs, read pointer 115 after then column pointer 425 jumps to redirect maps the row in the branch target table 412 of gained through offset address mapping block 416, by there being the list item that in the read pointer 115 of redirect number, BN1X address 411 is pointed in these row, be sent to selector switch 118 in tracking device 120 through bus 423 for subsequent use.
In the present embodiment, the destination address form deposited in the list item of branch target table 412 is track table address BN1X and BN1Y.When branch transition occurs, this track table address is put read pointer 115 by selection, row address BN1X bus 411 wherein points to a line addressing in an instruction block of level cache device 104 and instruction type table 410, BN1Y bus 413 wherein points to an instruction in above-mentioned instruction block, also carries out row addressing to offset address mapping block 416, storer 426.BN1Y in bus 413 is mapped as column address MBNY and is sent to secondary tracking device 420 by offset address mapping block 416, puts column pointer 425.Column pointer 425 works in coordination with sensing list item with the BN1X bus 411 in read pointer 115, content in this list item (comprising branch target address BN1X and BN1Y) is delivered to tracking device 120 through bus 423 for subsequent use.
Particularly, if branch transition occurs, TAKEN signal 113 controlled selector 118 selects the branch target from bus 423 to supply write register 112.At following clock cycle, tracking device 120 read pointer 115 is updated to the BN1 of described Branch Target Instruction, and a line at the branch target place in bus 411 sense order type list 410 of BN1X is wherein sent to offset address mapping block 416 and storer 426.BN1Y on read pointer 115 is sent to through bus 413 offset address mapping block 416 to be converted to branch target table 412 row number according to the data that above-mentioned branch target is expert at, now the value of register 422 and 424 is ' 1 ', these row number are delivered to selector switch 418 by bus 419 and write register 432 after being selected, and make column pointer 425 be updated to row number corresponding to Branch Target Instruction BN1Y.Now, according to the row number on the BN1X in read pointer 115 and column pointer 425, the list item that first take-off point from this Branch Target Instruction is corresponding can be pointed in destination address table 412.Now, the content in this list item is read out and through bus 423, branch target is delivered to selector switch 118, and it is for subsequent use that branch pattern delivers to controller.When read pointer 115 points to the next one ' 1 ' in storer 426, time (the corresponding branch target having delivered to selector switch 118), read pointer 115 control level cache 104 export command adapted thereto through bus 103 for processor core 102 perform, produce branch transition judge time, method described in embodiment before of pressing upgrades tracking device 120 read pointer 115, and detailed process does not repeat them here.
If branch transition does not occur, then the value of register 422 is ' 0 ', the value of register 424 is ' 1 ', at following clock cycle, BN1X in tracking device 120 read pointer 115 remains unchanged, the output write register 432 of incrementer 414 selected by selector switch 418, column pointer 425 is upgraded, and former row number increase one.Now, according to the line number on the BN1X in read pointer 115 and column pointer 425, after this branch instruction first list item that branch instruction is corresponding (list item of former list item right) can be found in destination address table 412.Now, content in this list item is read out and delivers to selector switch 118 through bus 423, afterwards when tracking device 120 read pointer 115 points to this take-off point and control level cache 104 export command adapted thereto through bus 103 for processor core 102 perform, produce branch transition result time, method described in embodiment before of pressing upgrades tracking device 120 read pointer 115, and detailed process does not repeat them here.
Below in conjunction with Fig. 4 and Fig. 5, be described with a concrete example.Suppose the content of instruction type list 410 and destination address table 412 in current orbit table 110 as illustrated in the embodiment of figure 5, and the value of tracking device 120 read pointer 115 is that ' 10 ' (namely BN1X is ' 1 ', BN1Y is ' 0 '), perform for processor core 102 through bus 103 1 to read corresponding instruction through bus 411 and the addressing of 413 pairs of level caches 104.Now the value (i.e. row number) of secondary tracking device 420 column pointer 425 is also ' 0 ', No. 0 list item in the row (i.e. the 1st row) that sensing target track table is pointed to by bus 411, and read contents in table ' 1C01 ', namely respective branch instructions is the branch that has ready conditions, and Branch Target Instruction has been stored in level cache 104, corresponding destination address BN1 has been ' 01 '.This BN1 is sent to the selector switch 118 in tracking device 120 through bus 423.In addition, to deliver to register 424 according to BN1Y value (' 0 ') addressing output order type (' 0 ') of read pointer 115 in bus 413 through bus 421 temporary for the next cycle for storer 426.Controller according to instruction type (' 0 ') in bus 421 for non-branch and controlled selector 118 selects the output of incrementer 114 to be sent to register 112.Again because not having branch transition to occur, it is temporary for the next cycle that TAKEN signal (' 0 ') is also sent to register 422.
At following clock cycle, tracking device 120 read pointer 115 increases one and obtains ' 11 ', and performs for processor core 102 through bus 103 to read corresponding instruction through bus 411 and the addressing of 413 pairs of level caches 104.Value due to register in this clock period 422 is ' 0 ', and therefore offset address mapping block 416 does not work; Register 424 is worth for ' 0 ', and therefore in secondary tracking device 420, register 432 does not upgrade, and namely column pointer 425 remains unchanged.In addition, to deliver to register 424 according to BN1Y value (' 1 ') addressing output order type (' 0 ') of read pointer 115 in bus 413 through bus 421 temporary for the next cycle for storer 426.Controller is that branch's controlled selector 118 does not select the output of incrementer 114 to be sent to register 112 according to instruction type (' 0 ') in bus 421.Again because not having branch transition to occur, it is temporary for the next cycle that TAKEN signal (' 0 ') is also sent to register 422.
At following clock cycle, tracking device 120 read pointer 115 increases one and obtains ' 12 ', and performs for processor core 102 through bus 103 to read corresponding instruction through bus 411 and the addressing of 413 pairs of level caches 104, and namely processor core 102 performs respective branches instruction.Controller is branch instruction according to instruction type in bus 421, and bus 423 top set type is conditional branching and address format is BN1, controls tracking device and suspends the branch's judgement waiting for processor core 102.In addition, to deliver to register 424 according to BN1Y value (' 2 ') addressing output order type (' 1 ') of read pointer 115 in bus 413 through bus 421 temporary for the next cycle for storer 426.Hypothesis branch transfer does not occur, and namely TAKEN signal 113 is ' 0 '.BN1Y after the increasing one that the selector switch 118 that this TAKEN signal (' 0 ') controls tracking device 120 selects incrementer 114 to export, makes the BN1 on read pointer 115 be updated to ' 13 ' at following clock cycle.Value due to register in this clock period 422 is ' 0 ', and therefore offset address mapping block 416 does not work; Register 424 is worth for ' 0 ', and therefore in secondary tracking device 420, register 432 does not upgrade, and namely column pointer 425 remains unchanged.Again because branch transition does not occur, it is temporary for the next cycle that TAKEN signal (' 0 ') is sent to register 422.
At following clock cycle, tracking device 120 read pointer 115 increases one and obtains ' 13 ', and performs for processor core 102 through bus 103 to read corresponding instruction through bus 411 and the addressing of 413 pairs of level caches 104.Controller according to instruction type (' 0 ') in bus 421 for non-branch controlled selector 118 selects the output of incrementer 114 to be sent to register 112.Value due to register in this clock period 422 is ' 0 ', and therefore offset address mapping block 416 does not work; And controlled selector 418 selects incrementer 414 to export; The value of register 424 is ' 1 ', control register 432 is updated to the row number ' 1 ' after the increasing one that selector switch 418 exports, No. 1 list item in the row (i.e. the 1st row) making column pointer 425 point to target track table to be pointed to by bus 411, and read contents in table ' 1C35 ', namely this branch instruction is the branch that has ready conditions, and Branch Target Instruction has been stored in level cache 104, corresponding destination address BN1 has been ' 35 '.The selector switch 118 that this BN1 is sent in tracking device 120 through bus 423 is for subsequent use.In addition, to deliver to register 424 according to BN1Y value (' 3 ') addressing output order type (' 0 ') of read pointer 115 in bus 413 through bus 421 temporary for the next cycle for storer 426.Again because not having branch transition to occur, it is temporary for the next cycle that TAKEN signal (' 0 ') is sent to register 422.
In the clock period of two afterwards, as before as described in embodiment, tracking device 120 read pointer 115 each clock period increases one, is followed successively by ' 14 ', ' 15 ', and performs for processor core 102 through bus 103 to read corresponding instruction through bus 411 and the addressing of 413 pairs of level caches 104 successively.In the process, because the value of register 422 is ' 0 ', therefore offset address mapping block 416 does not work; Register 424 value is ' 0 ', and therefore in secondary tracking device 420, register 432 does not upgrade, and namely column pointer 425 remains unchanged.In addition, to deliver to register 424 according to the BN1Y value (being followed successively by ' 4 ', ' 5 ') addressing output order type (' 0 ') of read pointer 115 in bus 413 through bus 421 temporary for the next cycle for storer 426.Again because not having branch transition to occur, it is temporary for the next cycle that TAKEN signal (' 0 ') is sent to register 422.
At following clock cycle, tracking device 120 read pointer 115 increases one and obtains ' 16 ', and performs for processor core 102 through bus 103 to read corresponding instruction through bus 411 and the addressing of 413 pairs of level caches 104, and namely processor core 102 performs respective branches instruction.In addition, to deliver to register 424 according to BN1Y value (' 6 ') addressing output order type (' 1 ') of read pointer 115 in bus 413 through bus 421 temporary for the next cycle for storer 426.Controller is branch instruction according to instruction type (' 1 ') in bus 421, and bus 423 top set type is conditional branching and branch target is BN1, controls tracking device and suspends the branch's judgement waiting for processor core 102.Suppose that now branch transition occurs, namely TAKEN signal 113 is ' 1 '.The branch target BN1 (' 35 ') deriving from bus 423 selected by the selector switch 118 of this TAKEN signal (' 1 ') control tracking device 120, makes the BN1 on read pointer 115 be updated to ' 35 ' at following clock cycle.Value due to register in this clock period 422 is ' 0 ', and therefore offset address mapping block 416 does not work; Register 424 is worth for ' 0 ', and therefore in secondary tracking device 420, register 432 does not upgrade, and namely column pointer 425 remains unchanged.Again because branch transition occurs, it is temporary for the next cycle that TAKEN signal (' 1 ') is sent to register 422.
At following clock cycle, tracking device 120 read pointer 115 is updated to ' 35 ', i.e. directional order type list the 3rd row the 5th list items.The value of read pointer 115 performs for processor core 102 through bus 103 to read corresponding instruction (i.e. Branch Target Instruction) through bus 411 and the addressing of 413 pairs of level caches 104.Value due to register in this clock period 422 is ' 1 ', therefore the BN1Y (' 5 ') on read pointer 115 is sent to offset address mapping block 416 through bus 413, and to be converted to row MBNY be ' 1 ', this value delivers to the selector switch 418 of secondary tracking device 420 through bus 419, and now value ' 1 ' controlled selector 418 of register 422 selects row number ' 1 ' conduct deriving from bus 419 to export; The value of register 424 is ' 1 ', control register 432 is updated to the row number ' 1 ' that selector switch 418 exports, No. 1 list item in the row (i.e. the 3rd row) making column pointer 425 point to target track table to be pointed to by bus 411, and read contents in table ' 1U00 ', namely this command adapted thereto is unconditional branch instructions, and Branch Target Instruction has been stored in level cache 104, corresponding destination address BN1 has been ' 00 '.This BN1 is sent to the selector switch 118 in tracking device 120 through bus 423.In addition, to deliver to register 424 according to BN1Y value (' 5 ') addressing output order type (' 0 ') of read pointer 115 in bus 413 through bus 421 temporary for the next cycle for storer 426.Again because not having branch transition to occur, it is temporary for the next cycle that TAKEN signal (' 0 ') is sent to register 422.After this instruction type three cycle bus 421 read is all non-branch pattern, and controller makes BN1Y in the pointer 115 of tracking device increase weekly ' 1 ' accordingly by precedent, secondary tracking device 420 does not then upgrade.
When in pointer 115 directional order type list 410,3 row 8 arrange, wherein ' 1 ' is read out, controller judges that this instruction is branch instruction accordingly, translate this from 423 again and branch into unconditional branch instructions and address format is BN1 type, namely controlled selector 118 selects branch target ' 00 ' in bus 423 stored in register 112.In this 3 row 8 row ' 1 ' is also sent to register 424 by bus 421 keeps in.This unconditional branch type also produces one ' 1 ' and is sent to register 422 and keeps in.0 row 0 in the pointer of next cycle tracking device and directional order type list 410, starts to perform thus, and controls level cache device 104 and send corresponding instruction and perform for processor core 102.Under register 424 and 422 value that is ' 1 ' controls, the mapping result (being now ' 0 ') that selector switch 418 selects bus 419 to send here as precedent makes pointer 425 point to the 0th row stored in register 432.Operational process afterwards can by that analogy, thus correctly operating instruction, realize function of the present invention.
In addition, can also improve Fig. 4 embodiment by method described in Figure 1B, make the embodiment of Fig. 4 not need to waste one-period after the tracing point running to the corresponding the last item instruction of every bar track on end tracing point.Use in this embodiment aforementioned distinguish terminate in tracing point method the third, when the last item instruction of instruction block is non-branch instruction, end tracing point is merged in tracing point corresponding to this instruction.Please refer to Fig. 7, it is another embodiment of caching system of the present invention.
The present embodiment and Fig. 4 similar, illustrate only part of module.Wherein, processor core 102, level cache 104, offset address mapping block 416, storer 426 are identical with the corresponding component in Fig. 4 with secondary tracking device 420.Difference is, track table 110 exports the content of two tracing points at every turn, namely instruction type table 410 exports contents in table and subsequent a contents in table of tracking device 120 read pointer 115 sensing respectively by bus 421 and 429, and destination address table 412 sends contents in table and subsequent a contents in table of the MBNY sensing that BN1Y is converted to through offset address mapping block 416 in read pointer 115 respectively by bus 423 and 427.Then correspondingly in tracking device 120 add type code translator 752, controller 754 and selector switch 116.In addition, processor core 102 also sends signal 161 to the controller 754 in tracking device 120.
In the present embodiment, selector switch 118 is no longer directly delivered in the output of incrementer 114, but delivers to selector switch 116.Under the addressing of the column pointer 425 that the read port of destination address table 412 exports at secondary tracking device 420, export corresponding two list items (current entry and next list item) content and put bus 423 and 427.Wherein, in the current entry content that exports through bus 423 of destination address table 412 destination address BN is sent to selector switch 118, branch pattern information is then sent to controller 754.In next contents in table that destination address table 412 exports through bus 427 destination address BN is sent to selector switch 116, branch pattern information is then sent to controller 752.
Under column address 413 addressing in the read pointer 115 that storer 426 exports at tracking device 120, export two list items (current entry and next list item) content and put bus 421 and 429 respectively.Wherein, the fundamental type information in the current entry that storer 426 exports through bus 421, except being sent to except register 424 as Fig. 4 embodiment, is also sent to type code translator 752 and controller 754.Fundamental type information in next list item that storer 426 exports through bus 429 is then also sent to type code translator 752 and controller 754.
The fundamental type information of current entry that type code translator 752 pairs of instruction type tables 410 are sent here and the fundamental type information of next list item carry out decoding, and its output control selector switch 116 carries out selecting from the output of incrementer 114 and the output of bus 427 and is sent to selector switch 118.
Similar with Figure 1B embodiment, BN2 in destination address table 412 is converted into BN1 by method described in embodiment before when being used to, therefore for convenience of explanation, in the present embodiment, can think that the destination address in the bus 423 and 427 that destination address table 412 exports is all BN1.
In the present embodiment, if the fundamental type information in bus 421 is non-branch instruction, then type code translator 752 controlled selector 116 selects the output of incrementer 114, controller 754 controlled selector 118 selects the output of selector switch 116, make that the BN1X in next clock period register 112 remains unchanged, BN1Y increases one, the next list item that in read pointer directional order type list 410, order performs, and from level cache 104, read this next instruction for processor core 102 and perform.In this embodiment for convenience of describing, suppose useful signal 111 continuously effective from processor core.
If the fundamental type information in present instruction bus 421 be branch instruction and fundamental type information in next instruction bus 429 be branch instruction, branch pattern information in bus 423 is conditional branching and branch pattern information in bus 427 is unconditional branch instructions, then type code translator 152 controlled selector 116 selects the unconditional branch target BN1 in bus 427, and controller 754 is according to TAKEN signal 113 controlled selector 118.If TAKEN signal 113 is ' 1 ', represent that the branch transition of the branch instruction that current entry is corresponding occurs, then the branch target BN1 in bus 423 selected by selector switch 118, make next clock period tracking device 120 read pointer 115 be updated to the branch target BN1 of this current branch instruction, and from level cache 104, read this Branch Target Instruction for processor core 102 and perform.Otherwise, the output (i.e. the branch target BN1 of this unconditional branch instructions) deriving from selector switch 116 selected by selector switch 118, make next clock period tracking device 120 read pointer 115 be updated to the branch target BN1 of this unconditional branch instructions, and from level cache 104, read this Branch Target Instruction for processor core 102 and perform.
If the branch pattern information in the fundamental type information in the fundamental type information in present instruction bus 421, next instruction bus 429, the branch pattern information in bus 423, bus 427 is not combinations thereof, then next instruction type does not affect present instruction, and type code translator 752 controlled selector 116 selects the output of incrementer 114.Controller 754 controls the traffic direction of the read pointer 115 of tracking device 120 according to the fundamental type information in present instruction bus 421 and the branch pattern information in bus 423, address format.If present instruction type is non-branch instruction, then type code translator 752 controlled selector 116 selects the output of incrementer 114, controller 754 controlled selector 118 selects the output of selector switch 116, make that the BN1X in next clock period register 112 remains unchanged, BN1Y increases one, the next list item that in read pointer directional order type list 410, order performs, and from level cache 104, read this next instruction for processor core 102 and perform.
If the fundamental type information in bus 421 is branch instruction and branch pattern information in bus 423 is branch instruction of having ready conditions, controller 754 is according to TAKEN signal 113 controlled selector 118.If TAKEN signal 113 is ' 1 ', represent that the branch transition of the branch instruction that current entry is corresponding occurs, then the branch target BN1 in bus 423 selected by selector switch 118, make next clock period tracking device 120 read pointer 115 be updated to the branch target BN1 of this branch instruction, and from level cache 104, read this Branch Target Instruction for processor core 102 and perform.Otherwise selector switch 118 selects to derive from the output of incrementer 114, such that the BN1X in next clock period register 112 remains unchanged, BN1Y increases one, and from level cache 104, read this next instruction perform for processor core 102.
If the fundamental type information in bus 421 is branch instruction and branch pattern information in bus 423 is unconditional branch instructions, controller 754 controlled selector 118 selects the branch target BN1 in bus 423, make next clock period tracking device 120 read pointer 115 be updated to the branch target BN1 of this branch instruction, and from level cache 104, read this Branch Target Instruction for processor core 102 and perform.
In the present embodiment, all with before embodiment is identical for function and the operation of other modules, does not repeat them here.
The caching system of the use compaction table shown in Fig. 7, only needs also can support that conjecture performs to the improvement done as shown in Figure 1 C of its tracking device 120.Refer to Fig. 8, it is another embodiment of tracking device of the present invention.The tracking device 120 of the present embodiment adds selector switch 162 and register 164 as the tracking device in Fig. 1 C, perform unchecked another for selecting, storing branch's conjecture to keep in, use in order to during conjecture mistake, two input selectors 118 are also changed to three input selectors 818.Its working method is as Fig. 1 C, that controller 854 is when translating the branch pattern in bus 423 and being conditional branch instructions, select of branch to put bus 115 by the prediction direction of this instruction to control level cache 104 and provide continual instruction for performing to processor core, and by these cue marks for conjecture performs.The track table address of another simultaneously selected in the future Article 1 instruction is for subsequent use stored in register 164.If the branch provided by bus 113 judges to prove that conjecture is as time correct, namely continue to perform along former conjecture direction.If when proving conjecture for mistake, namely as Fig. 1 C embodiment, the instruction that conjecture performs by controller 854 and intermediate result are removed, and controlled selector 166 is selected to put finger outer 115 by depositing the track table address of effect in register 164, makes level cache device 104 provide another Article 1 instruction of branch to perform for processor core 102.-BNX1 in pointer 115 is another respective rail of branch in bus 411 also directional order type list 410, read and put into offset address mapping block 416, the column address MBNY of Article 1 branch instruction in destination address table 412 after producing another Article 1 instruction of above-mentioned branch with BNY1 value 413 acting in conjunction in pointer 115 is sent to from secondary tracking device 420 through bus 419.Send the control signal selecting branch to the selector switch in secondary tracking device 420 with Time Controller 854, through register delay after one week, controlled selector 418 selects bus 419; With Time Controller 854 to or logic 824 send clock enable signal, control the MBNY value in bus 419 to deposit into register 432 after one week through register delay.Bus 411 coordinates to pointer 425 the corresponding list item pointed in destination address table 412, branch pattern wherein and track table address is sent to controller 854 through bus 423 and selector switch 818 is for subsequent use.Next list item is sent to selector switch 116 by bus 427, and the direction that after this namely controller 854 controls tracking device by the branch pattern in the instruction type read from storer 426 and bus 423 makes caching system provide suitable instruction to supply to perform to processor core 102.
Another kind of implementation be from secondary tracking device 420 as added selector switch and register by the same manner for storing the MBNY value of first branch target on another track being different from conjecture direction in tracking device 120, and by two input selectors 418 by three input selectors displacements.The output of this register connects the input end of this three input selector increase.Like this, in the tracking device 120 that three input selectors in secondary tracking device 420 can be produced by controller 854, same control signal postpones to control after one week; But the clock enable signal that still needs controller 854 to produce is sent to or logic 824 enable register 432 when guessing mistake exists new column address MBNY value of adding in register stored in original, to replace the value in upper example in bus 419 (the column address MBNY value of depositing in new interpolation register with in bus 419 be the same through mapping the column address MBNY value obtained).The operation of this embodiment to operate similar in upper example, do not repeat them here.
According to existing branch prediction techniques, when the branch transition result of branch instruction not yet produces, perform to first conjecture property an individual path of take-off point, once forecasting inaccuracy is true, then need to empty streamline, again perform from another individual path of take-off point.In the present invention, according to the position producing the pipelining segment that branch transition judges, the streamline in processor core is divided into front end (frontend) streamline and rear end (back end) streamline.Wherein backend pipeline comprises the pipelining segment of ending to streamline from producing the pipelining segment of branch's judgement, and front end streamline comprises a pipelining segment before judging from first order pipelining segment to branch.Like this, as long as contain two the front end streamlines and a backend pipeline selected by selector switch in processor core, just branch's judged result be can wait for, the subsequent instructions of branch instruction sequence address and Branch Target Instruction and subsequent instructions thereof performed respectively by two front end streamlines.When branch's judged result produces, then select the intermediate result of among two front end streamlines one to be delivered to backend pipeline continuation execution according to this output control selector switch, thus avoid the performance loss because branch transition prediction error causes.
Please refer to Fig. 9, it is of the present invention without the buffer memory of branch's loss and an embodiment of processor system.For convenience of description, illustrate only part of module in fig .9, wherein level cache 104 and track table 110 are all identical with the corresponding module in Fig. 2 embodiment.Difference is, the processor core 1102 in Fig. 9 comprises two front end streamlines and a backend pipeline.These two front end streamlines are respectively the order front end streamline of order (fall-through) address and subsequent instructions thereof after respective branches point; And the target front end streamline of respective branches target and subsequent instructions thereof.In addition, also add an instruction in Fig. 9 and read buffer memory 1104 and corresponding a line track table 1110 and a selector switch 1108.Wherein, the size that buffering 1104 is read in instruction can hold an instruction block in level cache 104, for the present instruction block that storage of processor core 1102 is performing; Track table 1110 item stores the track that buffering 1104 correspondence is read in instruction.In all embodiments of the invention, for ease of illustrating, all presumptive instruction reads the time delay of buffering is ' 0 ', namely reads buffering and can work as week write when week reading.In addition, except tracking device 120, also add another one tracking device 1120.For convenience of description, in the present embodiment tracking device 120 is called current tracking device, tracking device 1120 is called target tracking device.Too increase controller 1140 to coordinate the work of two tracking devices and to control the write of reading buffer memory 1104 and track table 1110.
Tracking device 120 is identical with the structure of 1120, and and tracking device in Fig. 7 embodiment similar, difference is that instead of corresponding two input selectors and interior details with three input selectors is respectively omitted so that understanding.Wherein, the output 1123 of incrementer 1114 in the input termination tracking device 1120 that the selector switch 118 in tracking device 120 increases; And the input end that selector switch 1118 in tracking device 1120 increases integrates with the output bus 117 of table 1110.Bus 1117 both can provide whole piece track to fill track table 1110 (now only using BNX addressing); Also a tracing point (list item) can be provided in track for tracking device 1120.In like manner bus 103 both can provide whole instruction block to fill instruction buffer 1104 (now also only using BNX addressing); Also an instruction in instruction block can be provided to perform for processor core 1102.
The stairstep signal 111 that processor core 1102 provides controls the stepping of current tracking device 120 and provides pipeline state to controller 1140.The branch that processor core 1102 monitored by controller 1140 provides judges 113, target front end stairstep signal 1111 and the bus 117 from track table, and the instruction type on 1117 and address format are coordinated, control total system runs.Controller 1140 is when processor core 1102 performs a unconditional branch instructions or conditional branch instructions and the branch of processor core judges signal 113 as branch's success, bus 1123 selected by the selector switch 118 controlled in current tracking device 120, the i.e. output of incrementer 1114 in target tracking device 1120, with stored in register 112, upgrade read pointer 115.Controller 1140 is when monitoring read pointer 115 decision processor core 1102 and performing the last item instruction in an instruction block, bus 117 selected by the selector switch 118 controlled in current tracking device 120, the address of next the sequential instructions block namely provided by last list item in track table 1110 (is actually the end tracing point of this row, concrete operations mode asks for an interview Fig. 7 embodiment, do not repeat them here), with stored in register 112, upgrade read pointer 115.In addition to the above, controller 1140 controlled selector 118 selects the output of incrementer 114, makes read pointer 115 stepping under the control of stairstep signal 111.
When controller 1140 monitor in bus 117 one represent the list item of branch instruction time, the selector switch 1118 in control objectives tracking device 1120 selects branch target in bus 117 with stored in register 1112, upgrades read pointer 1115.When controller 1140 monitoring bus 1115 finds that the last item instruction of instruction block target front end streamline be sent in processor core performs, controller controlled selector 1118 is selected content in bus 1117 to upgrade pointer 1115 stored in register 1112 and (is actually the end tracing point of this row, concrete operations mode asks for an interview Fig. 7 embodiment, does not repeat them here).In addition to the above, controller 1140 controlled selector 1118 selects the output of incrementer 1114, makes read pointer 1115 stepping under the control of target front end stairstep signal 1111.Controller 1140 when track table 1110 is unconditional branch through the list item that bus 117 exports or monitoring read pointer 115 determine to read controlled selector 1108 when terminating tracing point and select the BNX411 of current read pointer 115; All the other time select target read pointer 1115 BN1X part 1141.Pointer 115 controls to read buffering 1104 provides instruction for performing when stepping to the order front end streamline in processor core 1102.Pointer 1115 controls first-level instruction buffer 104 provides instruction for performing when stepping to the target front end streamline in processor core 1102.
In the present embodiment, BNY in tracking device 120 read pointer 115 reads buffer memory 1104 addressing through bus 413 pairs of track tables 1110 and instruction, read corresponding tracing point content and instruction, and provide described instruction through bus 1103 to the order front end streamline in processor core 1102.BNX in tracking device 1120 read pointer 1115 through bus 1109 pairs of track tables 110 and level cache 104 addressing, finds corresponding track and instruction block after bus 1141 is selected by selector switch 1108; Meanwhile, the BNY in tracking device 1120 read pointer 1115 to this track and instruction block addressing, reads corresponding tracing point content and instruction through bus 1143, and provides described instruction through bus 103 to the target front end streamline in processor core 1102.
Be described for the track table shown in Figure 10 below.Although the track table 110 of Figure 10 is made up of a table, be not divided into instruction type table and destination address table, identical with the content that destination address table 412 comprises with instruction type list 410 in Fig. 5 embodiment on its content comprised.In the present embodiment, current tracking device 120 as in the previous example tracking device is such, and according to the stairstep signal 111 that processor core 1102 is sent here, namely controller 1140 judges 113 according to the branch of instruction type and processor core, upgrades read pointer 115.Because instruction reads to store present instruction block and current orbit respectively in buffering 1104 and track table 1110, therefore only need to find according to the BNY in read pointer 115 present instruction and corresponding current trace points that are supplied to processor core 1102 execution wherein through bus 413 addressing.If the instruction type from the tracing point content that track table 1110 reads represents that present instruction is not branch instruction, then precedent renewal read pointer 115 pressed by current tracking device 120, make it stepping and steering order is read buffering 1104 and exported command adapted thereto and perform, until read pointer 115 points to a take-off point in track table 1110 through the order front end streamline of bus 1103 for processor core 1102.In the process, the backend pipeline of processor core 1102 all the time selecting sequence front end streamline Output rusults perform.
Suppose that this take-off point is unconditional branch, then controller 1140 selector switch 118 controlled in current tracking device 120 selects the tracing point content (i.e. the branch target tracing point BN of this unconditional branch) bus 117 deriving from track table 1110 to upgrade read pointer 115 as exporting.BNX in read pointer 115 is after bus 411 is selected by selector switch 1108, through bus 1109 pairs of level caches 104 and the addressing of track table 110, read corresponding target instruction target word block and target track, be stored into instruction respectively by bus 103 and 1117 and read in buffering 1104 and track table 1110.Simultaneously, BNY in read pointer 115 reads buffering 1104 and the addressing of track table 1110 through bus 413 pairs of instructions, target instruction target word in the target instruction target word block be stored described in reading is sent to the order front end streamline of processor core 1102 for performing through bus 1103, and the target trajectory point content read in target track is sent to current tracking device 120, stored in register 112 through bus 117.So complete unconditional branch transfer, operation is afterwards as described in precedent.
Read pointer 115 is pointed to the situation of conditional branching, exist in track table 1110 for the 1st row in Figure 10 track table 110.As described in precedent, when in read pointer 115, BN1Y part 413 points to the first tracing point in track table 1110, instruction type in the list item that bus 117 reads is non-branch, the type controlled selector 118 selects the output of incrementer 114, make read pointer 115 step to the second tracing point 1130 in track table 1110, the order front end streamline that the command adapted thereto (being a branch instruction) also simultaneously controlling to read corresponding tracing point 1130 in buffer memory 1104 by BNY bus 413 is sent to processor core 1102 through bus 1103 performs.The contents in table 1C01 of tracing point 1130 is after bus 117 reads, and via controller 1140 decoding judges that its branch pattern is conditional branching, and address format is BN1.This Time Controller 1140 controlled selector 118 selects the output of incrementer 114; The branch target BN in bus 117 selected by selector switch 1118.So at the rising edge of next clock, pointer 115 is updated to No. 3 tracing point pointed in trajectory table 1110 after tracing point 1130, and next instruction of the order after corresponding branch instruction is sent to order front end streamline; Simultaneously pointer 1115 is updated the branch target tracing point (being 0 row, first tracing point ' 01 ' in this example) pointing to branch source 1130 in track table 110, and points to the reading instruction corresponding to branch target in first-level instruction buffer memory 104 and be sent to target front end streamline.
Afterwards, current tracking device 120 read pointer 115 and target tracking device 1120 read pointer 1115 all as precedent stepping and respectively steering order are read buffering 1104 and level cache 104 and are exported two the front end streamlines execution of command adapted thereto for processor core 1102 respectively, until the described branch instruction corresponding with tracing point 1130 arrives the first paragraph of backend pipeline, produce branch and judge.Suppose that the degree of depth of front end streamline is N, then each N bar instruction during now processor core 1102 has processed in subsequent instructions from present instruction block after this branch instruction and Branch Target Instruction block from Branch Target Instruction subsequent instructions respectively.
In the present embodiment, the branch transition of take-off point 1130 does not occur, and the Output rusults of the backend pipeline selecting sequence front end streamline therefore in processor core 1102 continues to perform (namely having abandoned the middle execution result of target front end streamline); Controller 1140 controls selector switch 118 in current tracking device 120 and selects the output of incrementer 114 to continue stepping under stairstep signal 111 controls; Target tracking device 1120 stops stepping.Like this, along with the continuation stepping of current tracking device 120, continue to provide subsequent instructions to perform to processor core 1102 from being stored in the present instruction block read buffering 1104.
Afterwards, when current tracking device 120 read pointer 115 points to take-off point 1132 (in track table 110 the 1st row the 6th tracing point), controller 1140 content read wherein is ' 1C35 ', represent that take-off point 1132 is the direct descendant that has ready conditions, and branch target tracing point BN is ' 35 ', i.e. the 3rd row the 5th tracing point.Track table 1110 exports this target trajectory point BN according to the BN1Y in bus 413.Controller 1140 upgrades read pointer 1115 as this BN ' 35 ' is delivered to target tracking device 1120 through bus 117 by front control stored in register 1112; And the output of tracking device 120 as front selection incrementer 114 makes read pointer 115 stepping.Identical with precedent afterwards, when the branch instruction corresponding to tracing point 1132 arrives the pipelining segment producing branch transition result, the read pointer 115 of tracking device 120 and the read pointer 1115 of tracking device 1120 respectively steering order read buffering 1104 and level cache 104 and output the instruction of N bar to the order front end streamline of processor core 1102 and target front end streamline.
In the present embodiment, the branch transition of take-off point 1132 successfully occurs, and the Output rusults of the backend pipeline select target front end streamline therefore in processor core 1102 continues to perform (execution result namely having abandoned order front end streamline).BNY after selector switch in current tracking device 120 118 selects bus 1123 derives from the increasing one of BNX in target tracking device 1120 read pointer 1115 and incrementer 1114 is stored in register 112 as BN to upgrade read pointer 115.What this BN pointed to is exactly N+1 article of instruction from branch target, i.e. processor core 1102 instruction that next should perform.Meanwhile, the BNX in target tracking device 1120 read pointer 1115 reads command adapted thereto block (namely described branch target starts the instruction block at N+1 article of place) and respective rail respectively by bus 1109 and is stored into instruction through bus 103 and 1117 and reads in buffer memory 1104 and track 1110 after selector switch 1108 is selected in level cache 104 and track table 110.So far, current tracking device 120 read pointer has been updated to and has pointed to branch target and start N+1 article of instruction, and the instruction block at this instruction place has been filled in instruction and reads in buffering 1104, and the track of correspondence has been filled in track table 1110.Like this, along with the continuation stepping of current tracking device 120, the instruction block starting N+1 article of instruction place along described branch target continues to provide subsequent instructions to processor core 1102 to perform through bus 103 to order front end streamline by reading buffering 1104.Backend pipeline selecting sequence front end streamline after select target front end streamline N number of clock period in processor 1102, makes free of discontinuities between the N article of instruction from target front end streamline and the N+1 article of instruction from order front end streamline perform.Realize free of losses branch thus.
In above process, when target tracking device 1120 read pointer 1115 steps to end tracing point or the unconditional branch point of target track in track table 110, next track or the unconditional branch target track of target track can be updated to as described in precedent.But, once current tracking device 120 read pointer 115 steps to end tracing point or the unconditional branch point of current orbit in track table 1110, and when track table 110 is used to control to provide instruction to target front end streamline by target tracking device 1120 at that time, then cannot read next track corresponding or unconditional branch target track from track table 110.
According to technical solution of the present invention, the first solution is the stepping stopping current tracking device 120 read pointer 115, waits for that processor core 1102 produces branch transition result.If branch transition does not occur, then now target tracking device 1120 no longer stepping, selector switch 1108 is selected the BNX in bus 411 (and next track of current orbit or the BNX of unconditional branch target track) to control track table 110 and is exported this next track by bus 11125 or unconditional branch target track is stored in track table 1110, makes current tracking device 120 to continue stepping.If branch transition occurs, as described in being then updated to by current tracking device 120 read pointer 115 as described in precedent, branch target starts BN corresponding to N+1 article of instruction, and the instruction block at this instruction place and respective rail are filled into respectively instruction and read in buffering 1104 and track table 1110, make current tracking device 120 to continue stepping.
Second workaround is the stepping suspending target tracking device 1120 read pointer 1115.Utilize this pause period, selector switch 1108 is selected the BNX in bus 411 (and next track of current orbit or the BNX of unconditional branch target track) to control track table 110 and is exported this next track by bus 11125 or unconditional branch target track is stored in track table 1110, makes current tracking device 120 to continue stepping.And at following clock cycle, target tracking device 1120 read pointer 1115 can continue stepping.Subsequent operation is as described in precedent.
The third solution improves the structure of Fig. 9 embodiment, increases an instruction and read buffering and corresponding track table for storing target instruction target word block and respective rail.Like this, current tracking device 120 and target tracking device 1120 all read buffering and track addressing to respective instruction, and need not access level cache 104 and track table 110.Need to turn to next track or unconditional branch target track once any one tracking device, level cache 104 and track table 110 can export command adapted thereto block and track and be filled into command adapted thereto and read in buffering and respective rail table.Figure 11 embodiment that to be this solution be separated in such as Fig. 8 example on track table.Single track table this solution can the rest may be inferred, repeat no more.
Please refer to Figure 11, it is another embodiment of the caching system of support free of losses branch of the present invention.In fig. 11, corresponding component in level cache 104, track table 110, offset address mapping block 416, storer 426, Fig. 8 embodiment is identical, tracking device 120 and secondary tracking device 910 also with Fig. 8 embodiment in tracking device 120 and secondary tracking device 420 function identical.Wherein tracking device 120 is current tracking device, secondary tracking device 920 is current secondary tracking device, and for clarity, illustrate only the part of module in target tracking device 120 in Figure 11, and eliminate register 164, selector switch 162, controller 752 and 854, and eliminate or logic 824; In addition, two input selectors 162 in Fig. 8 are substituted to accept when branch transition occurs from the BN address that target tracking device is sent here by three input selectors 118.Register 422,424 in secondary tracking device 910 is moved to outside 920, and is included in controller 1260.Controller 1260 accepts from the TAKEN signal 113 of processor core 1102, read pointer 115 and 1115 and all 4 the tracking devices of input control of bus 421,921,423 and the operation of secondary tracking device.It is identical with the corresponding component in Fig. 9 embodiment that buffering 1104 is read in processor core 1102 and instruction, and wherein instruction reads buffering 1104 for storing present instruction block, corresponding with the instruction type in storer 426.
Add an instruction in Figure 11 and read buffering 1204 for storing target instruction target word block.Correspondingly, add a storer 1226 for storing instruction type corresponding to this target instruction target word block, and add a target tracking device 1120 and a current secondary tracking device 1220.Wherein, the structure of target tracking device 1120 is identical with the tracking device 120 in Fig. 7 example, and the secondary tracking device 1220 of target is identical with the structure of current secondary tracking device 910.In addition, selector switch 1108,1210 and selector switch 1208 is also had in fig. 11.Wherein, selector switch 1108 is selected the BNX in the read pointer of two tracking devices, it exports and controls first-level instruction buffer 104 to reading buffering 1104 or read buffering 1204 to fill in instruction block, also point to a line in track table 110, the full line from this row in sense order type list 410 puts into offset address mapping block 416.Selector switch 1210 is selected for the BNY in the read pointer to two tracking devices, and its output is sent to offset address mapping block 416 to the table 410 content acting in conjunction read by corresponding BNX wherein to produce next take-off point address MBN1Y signal 419.Selector switch 1208 is for selecting the read pointer of two secondary tracking devices, it exports and is used for reading a list item in above-mentioned BNX points to from destination address table 412 a line, the branch pattern in its list item and address format by controller for determine tracking device trend and when address format is BN2 for instruction block is inserted level cache device 104.The same signal that selector switch 1108 and 1210 is produced by controller 1260 controls, and another signal that selector switch 1208 is produced by controller 1260 is controlled (for clarity, these signals are not shown in the diagram).Like this, selector switch 1108,1210 and 1208 cooperation is selected the read pointer of tracking device, secondary tracking device.In all embodiments of the invention, for ease of illustrating, all presumptive instruction reads the time delay of buffering is ' 0 ', namely reads buffering and can work as week write when week reading.
In the present embodiment, under the control of the stairstep signal 111 that current tracking device 120 is sent here at processor core 1102, upgrade read pointer 115, steering order is read buffering 1104 and is exported command adapted thereto is sent to processor core 1102 order front end streamline confession execution through bus 1103; The BN1Y part 413 of read pointer 115 is then mapped as MBN1Y stored in current secondary tracking device 910 in offset address mapper 416.Secondary tracking device 910 is to read the branch target of the Article 1 branch instruction (can be the instruction ability pointed by 413) from the instruction pointed by 413 in a line in destination address table 412 pointed by read pointer 115 of this MBN1Y value.This branch target, hereinafter referred to as branch target 1, be stored in target tracking device 1120, putting read pointer 1115, controlling to read from first-level instruction buffer 104 to read buffering 1204 and therefrom read command adapted thereto with read pointer 1115 BN1Y part 1143 to be wherein sent to the target front end streamline of processor core 1102 for performing through bus 1255 stored in target instruction target word with BN1X1411 wherein; The BN1Y part 1143 of read pointer 1115 is then mapped as MBN1Y stored in the secondary tracking device 1220 of target in offset address mapper 416.In process, read pointer 115 stepping always provides instruction to order front end streamline, read pointer 1115 also stepping always until fill up target front end streamline.
Controller 1260 accepts the feedback of processor core and the instruction read in advance from track table, branch pattern and address format, coordinates, controls total system operation.The stairstep signal 111 that processor core 1102 provides controls the stepping of current tracking device 120 and provides pipeline state to controller 1260.Branch pattern in instruction type and 423 on controller 1260 controlling bus 421,921, address format control the operation of each tracking device and control to cushion and the corresponding storage of instruction type storer and the offset operation of offset address mapping block reading.Controller 1260 is when processor core 1102 performs a unconditional branch instructions or conditional branch instructions and the branch of processor core judges signal 113 as branch, bus 1123 selected by the selector switch 118 controlled in current tracking device 120, the i.e. output of incrementer 1114 in target tracking device 1120, to upgrade register 112.Controller is when processor core 1102 performs the last item instruction in an instruction block, bus 423 selected by the selector switch 118 controlled in current tracking device 120, the address of next the sequential instructions block namely now provided by destination address storer 412, to upgrade register 112.Selector switch 918 in the current secondary tracking device 910 of following clock cycle controller control that above-mentioned two situations occur is selected to upgrade register 912 from the MBN1Y (column address of destination address storer 412) in the bus 419 of offset address mapping block 416.In addition to the above, controller controlled selector 118 selects the output of incrementer 114, makes read pointer 115 stepping under the control of stairstep signal 111; And controlled selector 918 selects the output of incrementer 914, (represent that being now sent to the instruction that processor core performs is a branch instruction) when the numerical value in bus 421 is ' 1 ', stepping makes current secondary tracking device read pointer 425 point to order next column in destination address storer.
When controller 1260 monitors a new branch target in bus 423, (its monitoring mode can be the read request of change on monitoring bus 423 or monitoring objective addressed memory 412 or read address, change etc. as 1223), selector switch 1118 in control objectives tracking device 1120 selects the branch target in bus 423 to upgrade register 1112, and in the secondary tracking device 1220 of control objectives, selector switch 1218 is selected to upgrade register 1212 from the-MBN1Y (column address of destination address storer 412) in the bus 419 of offset address mapping block 416 after one clock cycle.When controller 1260 monitoring bus 1115 finds that the last item instruction of instruction block target front end streamline be sent in processor core performs, controller controlled selector 1108 and 1208 is selected to read last list item of this row (address of next sequence address data block) in destination address table 412 with BN1Y part in the BN1X part 1411 in target read pointer 1115 and the read pointer in bus 1215 for address and is put bus 423, and the selector switch 1118 in control objectives tracking device 1120 is selected to upgrade register 1112 with the branch target address in bus 423, and the selector switch 1218 in the secondary tracking device 1220 of following clock cycle control objectives upgrades register 1212.In addition to the above, controller controlled selector 1218 selects the output of incrementer 1114, makes read pointer 1115 stepping under the control of target front end stairstep signal (for the purpose of understanding, this signal does not show on Figure 11); And controlled selector 1218 selects the output of incrementer 1214, (represent that being now sent to the instruction that processor core performs is a branch instruction) when the numerical value in bus 423 is ' 1 ', stepping makes target secondary tracking device read pointer 1215 point to order next column in destination address storer.
If read one ' 1 ' in read pointer 115 stepping process from storer 426, represent that corresponding instruction is a branch instruction, hereinafter referred to as branch instruction 1.Now identical with Figure 11 example, current tracking device pointer still along current orbit stepping and steering order read buffering 1104 and export command adapted thereto and be sent to the order front end streamline of processor core 1102 through bus 1103 for performing.The output of storer 426 is put bus 421 and is controlled the stepping of current secondary tracking device 910, and its read pointer 425 controls the branch target address reading next branch instruction from destination address table 412, hereinafter referred to as branch target 2.
Branch as branch instruction 1 is judged as not branch, then pointer 115 continues along current orbit stepping, continues to provide instruction to order front end streamline; Target front end streamline is then cleared, and branch target 2 is received in target tracking device 1120, is after this controlled to provide instruction from branch target 2 to target front end streamline, in order to the next take-off point that pointer 115 points to by read pointer 1115 as front.If read one ' 1 ' in read pointer 1115 stepping process from storer 1226, represent that corresponding instruction is a branch instruction, then bus 921 control objectives secondary tracking device 1220 stepping accordingly, corresponding next branch instruction address MBN1Y in destination address table is for subsequent use for record.Branch as branch instruction 1 is judged as branch, then the next step value of order that pointer reads 1115 is received in the pointer 115 of current tracking device 120, and the content that target reads buffering 1204 is stored into currently to be read buffering 1104 and provides instruction to order front end streamline thus.Processor core 1102 also switches the command N clock period performed from target front end streamline, switches the capable instruction from order front end streamline of receipt afterwards.
In addition, once the read pointer of tracking device 120 or 1120 arrives terminate tracing point or unconditional branch point, then controller 1260 controls this tracking device and corresponding secondary tracking device operates jointly as described in Fig. 8 embodiment.Illustrate for current tracking device 120 below, target tracking device 1120 performs at the mode of operation terminating tracing point or unconditional branch point but its instruction provided is sent in processor core 1102 target front end streamline similar to current tracking device.Before the read pointer 115 of current tracking device 120 arrives a take-off point, the read pointer 425 of current secondary tracking device 910 by the corresponding list item of pointer 1223 as destination address table 412 in precedent sensing track table 110, and reads this contents in table and puts bus 423.According to the branch pattern in contents in table, controller 1260 judges that this take-off point is unconditional branch, wait for that in bus 421, the notice of processor core 1102 execution has been sent in the corresponding instruction of take-off point therewith.When in bus 421, value is ' 1 ', controller 1260 selector switch 118 controlled in current tracking device 120 selects the branch target tracing point BN of this unconditional branch in bus 423 to upgrade read pointer 115 as output.BNX in read pointer 115 is after bus 411 is selected by selector switch 1108, through bus 1109 pairs of level caches 104 and the addressing of track table 110, from level cache 104, read corresponding target instruction target word block be stored into instruction by bus 103 and selector switch 1250 and read in buffering 1104, and corresponding line is read through bus 417 from the instruction type table 410 of track table 110, selector switch 1252 to be stored in storer 426 and to deliver to offset address mapping block 416.Simultaneously, BNY in read pointer 115 reads buffering 1104 and storer 426 addressing through bus 413 pairs of instructions, the target instruction target word of reading in the instruction block be stored described in reading buffering 1104 from instruction is sent to the order front end streamline of processor core 1102 for performing through bus 1103, and the instruction type reading the target trajectory point in target track from storer 426 is sent to current secondary tracking device 910 through bus 421, so complete unconditional branch transfer.
Current tracking device 120 continues stepping and current secondary tracking device 910 upgrades read pointer 425 when branch occurs according to signal 419 afterwards, read pointer 425 passes through the addressing of bus 1223 pairs of destination address tables 412 after selector switch 1208 is selected, point to first take-off point from described unconditional branch target in advance, read corresponding contents in table (i.e. branch target BN), and by bus 423, this branch target BN output is delivered to the operation of current tracking device 120 and target tracking device 1120 and controller as described in precedent.Further, target tracking device 1120 upgrades read pointer 1115 according to above-mentioned branch target BN.BNX1251 in this read pointer 1115 now selects to be sent to one-level by selector switch 1108 and reads buffer memory 104 and read target instruction target word block and insert target through bus 103 and read buffering 1204.BNY1253 in read pointer 1115 then reads to choose buffering 1204 target instruction target word from target and to be sent to target front end streamline in processor core 1102 through bus 1255.After this, device 1120 stepping of target tracking until the altogether N bar instruction from branch target is received in this front end streamline, till processor core 1102 provides feedback signal to notify that tracking device 1120 stops stepping.Because the degree of depth N of front end streamline is definite value, so also stop signal can be provided by a counter, but still need from the stairstep signal of processor core 1102, in order to the state by processor core, as arrheaed waterline etc., notification target tracking device 1120.
When read pointer 115 points to take-off point, now in current tracking device 120, selector switch 118 selects the output of incrementer 114 to upgrade pointer 115, makes pointer 115 continue to read reading command buffering 1104 be sent to order front end streamline in processor core 1102 from current.Afterwards, current tracking device 120 read pointer 115 is as precedent stepping, and now as target front end streamline not yet fills up, then target tracking device 1120 is also simultaneously in stepping, provides instruction as previously mentioned to target front end streamline.Until the branch instruction corresponding to this take-off point arrives branch through order front end streamline judge flowing water section (first paragraph of backend pipeline), produce branch transition and judge.
Be described for the track table (shown in its essence with Figure 10, track table is identical) shown in Fig. 5 below.Suppose that a branch arranged as branch target with 1 row 0 in instruction type table 410 occurs, now BN1 ' 10 ' is stored into the register 112 in tracking device 120 as precedent, makes read pointer 115 for ' 10 '.As previously mentioned, No. 1 storage block now in first-level instruction buffer memory 104 is read out reads buffering 1104 through bus 103 and selector switch 1250 stored in current, BN1Y413 in read pointer 115 then points to No. 0 list item in 1104, and the order front end streamline that therefrom sense order is sent to processor core 1102 through bus 1103 performs.Meanwhile, the 1st row in instruction type table 410 is read out, through bus 417 put into offset address mapping block 416 and through selector switch 1252 stored in storer 426.BN1Y part 413 in read pointer 115 to be selected in offset address mapping block 416 as precedent maps out its MBN1Y value of signal 419 to be ' 0 ' through selector switch 1210.
Following clock cycle pointer 115 stepping, its value is ' 11 ', and control sequence is read buffering 1104 and read instruction in its No. 1 list item, performs for order front end streamline.Meanwhile, in storer 426, No. 1 list item is read out and puts bus 421, and its value is ' 0 '.Simultaneously, controller 1260 is latched into the register in secondary tracking device 910 according to a point value ' 0 ' of sending away on signal 419 for above-mentioned generation, make branch target table read pointer 425 for ' 0 ', select to point to No. 0 list item in the 1st row pointed out by BNX in branch target table 412 through bus 1223 through selector switch 1208, this contents in table 1C01 is read out.Controller 1260 judges that this entry format is BN1, does not therefore need to carry out the padding of first-level instruction buffer memory.Controller also judges that this branch pattern is conditional branching, and the selector switch 1218 therefore in control objectives tracking device 1120 selects this list item BN1 address ' 01 ' in bus 423 to latch in order to register 1212 in target tracking device.
Following clock cycle pointer 115 stepping, its value is ' 12 ', and control sequence is read buffering 1104 and read instruction in its No. 2 list items, performs for order front end streamline.Now, the value read from storer 426 is ' 1 ', and put bus 421, controller 1260 judges that it is branch instruction, and be judged as conditional branching type by the value 1C01 of its command adapted thereto in bus 423, therefore wait for that the TAKEN branch of processor core 1102 judges that 113 decide tracking device trend.Meanwhile, register 1212 latches, and therefore pointer 1115 is ' 01 ' (branch target), selects through selector switch 1108, controls to read r go out the 0th row in first-level instruction buffer 104 through bus 1109, through bus 103 stored in target instruction target word buffering 1204; Bus 1109 also controls the 0th row in sense order type list 410 stored in offset address mapping block 416 and storer 1226.Simultaneously, the instruction read from target instruction target word buffering 1204 in No. 1 list item of BN1Y part 1143 on pointer 1115 is sent to target front end streamline through bus 1255 and is performed, bus 1143 also selects the Article 1 branch instruction mapped out in offset address mapping block 416 from this branch target MBN1Y through selector switch 416 is ' 0 ', be sent to secondary tracking device 1220 through bus 419, this Time Controller controls selector switch 1218 in 1210 and selects this MBN1Y to latch in order to register 1212.Secondary tracking device 910 is also because bus 421 is the ' 1 ' stepping under controller 1260 controls simultaneously, because now there is no branch, so controller 1260 controlled selector 918 selects the output of incrementer 914, the output of above-mentioned selector switch 918 makes read pointer 425 for ' 1 ' stored in register 912, select through selector switch 1208, point to No. 1 list item in the 1st row pointed out by BNX in branch target table 412 through bus 1223, this contents in table 1C35 is read out and puts bus 423.Controller 1260 judges that it is conditional branching type, to wait until when its command adapted thereto is performed (being the instruction during 1 row 6 arranges) in order to control the action of tracking device by temporary for its type.
Following clock cycle pointer 115 stepping, its value is ' 13 ', and control sequence is read buffering 1104 and read instruction in its No. 3 list items, performs for order front end streamline.Pointer 1215 also stepping, its value is ' 02 ', and control objectives is read buffering 1204 and read instruction in its No. 2 list items, performs for target front end streamline.In secondary tracking device 1220, register 1212 is by the output of selector switch 1218, and its value is the MBN1Y latch of ' 0 '.But the now output bus 1215 of selector switch 1208 not mask register 1212.
Following clock cycle pointer 115 stepping, its value is ' 14 ', and control sequence is read buffering 1104 and read instruction in its No. 4 list items, performs for order front end streamline.Pointer 1115 also stepping, its value is ' 03 ', and control objectives is read buffering 1204 and read instruction in its No. 3 list items, performs for target front end streamline.
Following clock cycle pointer 115 stepping, its value is ' 15 ', and control reading is current reads instruction in No. 5 list items in buffering 1104, performs for order front end streamline.Pointer 1115 also stepping, its value is ' 04 ', and control objectives is read buffering 1204 and read instruction in its No. 4 list items, performs for target front end streamline.In this cycle, processor core 1102 has produced branch and has judged, is not branch, is sent to controller through TAKEN signal 113.Controller 1260 controlled selector 118 selects the output of incrementer 114.According to unbranched judgement, target front end streamline is cleared.
Following clock cycle pointer 115 stepping, its value is ' 16 ', and control sequence is read buffering 1104 and read instruction in its No. 6 list items, performs for order front end streamline.And as No. 6 list items in precedent sense order type memory 426, put bus 421, its value is ' 1 '.Controller 1260 controls to make the BN value on pointer 1115 be updated to BN value ' 35 ' in bus 423.BNX part 411 on pointer 1115 is selected through selector switch 1108, control to read No. 3 instruction blocks in first-level instruction buffer memory 104 through bus 1109, stored in target instruction target word buffering 1204, and No. 5 list items controlled in reading 1204 by the BN1Y part 1143 on pointer 1115 are sent to target front end streamline confession execution in processor core 1102; 3rd row of bus 1109 simultaneously also in sense order type list 410 is stored in offset address mapping block 416 and storer 1226.BN1Y part 1143 in pointer 1115 is selected through selector switch 1210, the MBN1Y mapping out the Article 1 branch instruction from this branch target in offset address mapping block 416 is ' 1 ', be sent to secondary tracking device 1120 through bus 419, this Time Controller 1260 controlled selector 1218 selects this MBN1Y.Secondary tracking device 910 is also because bus 421 is the ' 1 ' stepping under controller 1260 controls simultaneously, because now there is no branch, so controller 1260 controlled selector 918 selects the output of incrementer 914, the output of above-mentioned selector switch 918 makes read pointer 425 for ' 2 ' stored in register 912, select through selector switch 1208, point to No. 2 list items in the 1st row pointed out by BNX in branch target table 412 through bus 1223, this contents in table 1U30 is read out and puts bus 423.Controller 1260 judges that it is unconditional branch type, wait for that its command adapted thereto is performed then (be the address of next sequential instructions row of the 8th row in the 1st row in instruction type list 410 in Fig. 5 in fact, record in the mode of unconditional branch instructions) in order to control the action of tracking device.
Following clock cycle pointer 115 stepping, its value is ' 17 ', and control sequence is read buffering 1104 and read instruction in its No. 7 list items, performs for order front end streamline.Pointer 1115 also stepping, its value is ' 36 ', and control objectives is read buffering 1204 and read instruction in its No. 6 list items, performs for target front end streamline.Because pointer 115 has arrived the last item instruction in this instruction block, therefore the preparation of reading first-level instruction buffer memory 104 according to the BN ' 30 ' in currency ' 1U30 ' in bus 423 done by controller 1260.
Following clock cycle pointer 115 is worth for ' 30 ', reads buffering 1104 and reads instruction in its No. 0 list item, perform for order front end streamline as precedent controls the 3rd instruction block read by first-level instruction in buffer memory 104 stored in order.Pointer 1215 also stepping, its value is ' 37 ', and control objectives is read buffering 1204 and read instruction in its No. 7 list items, performs for target front end streamline.Because pointer 1115 has arrived the last item instruction in this instruction block, therefore the preparation of reading first-level instruction buffer memory 104 according to the BN ' 00 ' in currency ' 1U00 ' in bus 423 done by controller 1260.Controller 1260 controlled selector 1108 is selected bus 1115 to put bus 1109 and is pointed to the 3rd row in track table 110, and controlled selector 1208 is selected read pointer 1215 to be sent in destination address table 412 through bus 1223 to read its 1st ' 1U00 ' and put bus 423 simultaneously.Controller 1260 is last instruction in an instruction block according to the instruction that this is just being performed, and the instruction type that corresponding 7th reading is sent here through bus 921 from storer 1226 is non-branch instruction, thus judge that ' 1U00 ' does not in fact correspond to the true instruction in program, but one is terminated unconditional branch.Therefore, do not wait send here from bus 921 corresponding with ' 1U00 ' for ' 1 ' instruction type, namely determine branch.
Following clock cycle pointer 115 is worth for ' 31 ', controls current buffering 1204 of reading and reads instruction in its No. 1 list item, perform for order front end streamline.Pointer 1115 be ' 00 ' control by first-level instruction the 0th instruction block read in buffer memory 104 read buffering 1204 stored in target and read instruction in its No. 0 list item, perform for target front end streamline.In this cycle, processor core 1102 has produced branch and has judged to deliver to controller 1260 through signal 113, is branch.The value ' 01 ' (value of BN1Y after incrementer increment namely on target read pointer 1115) in bus 1123 selected by controller 1260 according to this judgement controlled selector 118.The output of controller 1260 also controlled selector 1252 selection memory 1226.According to the judgement of branch, order front end streamline is cleared.
Following clock cycle is according to the judgement carrying out branch, and controller 1260 controls content in target instruction target word impact damper 1204 to write current instruction buffer 1104 through selector switch 1250, also controls the output of selector switch 1252 write storer 426.The output that register 112 latches selector switch 118 makes read pointer 115 value be ' 01 ', controls current buffering 1104 of reading and reads instruction in its No. 1 list item, perform for order front end streamline.Instruction in target instruction target word streamline continues to perform, but controller control objectives tracking device 1120 does not make target buffer 1204 continue to provide more multiple instruction to the target front end streamline in processor core.The value ' 1 ' of the BN1Y part 413 in pointer 115 is mapped as ' 0 ' through mapping block 416 sends through bus 419 as front.From this week, processor core 1102 is selected to perform N number of clock period from the instruction of target front end streamline.
Following clock cycle pointer 115 is worth for ' 02 ', and control objectives is read buffering 1204 and read instruction in its No. 2 list items, performs for order front end streamline.' 0 ' value in bus 419 latches and puts read pointer 425 by the register 912 in secondary tracking device 910, and 0 row in 0 row that selector switch 1208 bus 1223 is pointed to by pointer 115 from destination address table 412 read contents in table 2C83.Controller judges that this entry format is BN2, its BN2 value ' 83 ' is sent to initiatively table through bus 423, makes command adapted thereto block be received in level cache, and inserted in destination address table 412 by the BN1 of this level cache.This BN1 is read by from bus 423, uses in order to tracking device.So go round and begin again, break-even branch operation can be realized.
Similar with Fig. 9 embodiment, if branch transition does not occur, then the Output rusults of the backend pipeline selecting sequence front end streamline in processor core 1102 continues to perform (namely having abandoned the execution result of target front end streamline); Current tracking device 120 continues stepping; Target tracking device 1120 stops stepping.Like this, along with the continuation stepping of current tracking device 120, continue to provide subsequent instructions to perform to processor core 1102 along present instruction block.If branch transition successfully occurs, then the Output rusults of the backend pipeline select target front end streamline in processor core 1102 continues to perform (execution result namely having abandoned order front end streamline).BNY after selector switch in current tracking device 120 118 selects bus 1123 derives from the increasing one of BNX in target tracking device 1120 read pointer 1115 and incrementer 1114 is stored in register 112 as BN to upgrade read pointer 115.What this BN pointed to is exactly N+1 article of instruction from branch target, i.e. processor core 1102 instruction that next should perform.Simultaneously, the instruction instruction block read in buffering 1204 is stored in instruction and reads in buffering 1104, and the full content in storer 1226 is also stored in storer 426, make current tracking device 120 read pointer be updated to sensing branch target and start N+1 article of instruction, and the instruction block at this instruction place has been filled in instruction reads in buffering 1104, and the instruction type of correspondence has been filled in storer 426.Like this, along with the continuation stepping of current tracking device 120, the instruction block starting N+1 article of instruction place along described branch target continues to provide subsequent instructions to perform to processor core 1102.
In the present embodiment, read buffering 1104 and 1204 store present instruction block and target instruction target word block respectively owing to employing two instructions, and correspondingly employ two storeies 426 and 1226 and store instruction type corresponding to this two instruction blocks respectively, therefore no matter be end tracing point or the unconditional branch point that current tracking device 120 read pointer 115 or target tracking device 1120 read pointer 1115 arrive place track, corresponding instruction block and instruction type can be read from level cache 104 and track table 110, being filled into command adapted thereto respectively reads in buffering and storer, and this tracking device itself need not be stopped or suspending the stepping of another tracking device.
According to technical solution of the present invention, can also on the basis of Figure 11, the capacity of buffering and command adapted thereto type memory is read in further increase instruction, makes the instruction block reading can preserve in buffering some in instruction, and preserves the instruction type that order answers quantity in memory.When tracking device 120 or 1120 read pointer is updated to the track beyond current orbit and target track, can check whether this track and command adapted thereto block are also stored in described storer and instruction is read in buffering.Its embodiment can be the BN1X address reading to cushion or store in corresponding instruction type storer each corresponding instruction block in instruction.When needs read first-level instruction buffer, the BN1X value in read pointer 115 and 1115 is first sent to read to cushion mates with each BN1X wherein stored.If read buffering at described storer and instruction have storage (namely BN1X has coupling), then can reduce the access times to level cache 104 and track table 110, reduce power consumption.
In addition, in Fig. 9,11 embodiments, when last rule part branch instruction not yet arrives the pipelining segment producing branch transition result, if what the read pointer of current tracking device 120 or target tracking device 1120 pointed to is a conditional branching point (i.e. a rear conditional branching point), then cannot provide the instruction on two individual paths of this rear conditional branching to processor core 1102 simultaneously.Two kinds of solutions can be had to this, point to a rear conditional branching point for tracking device 120 read pointer 115 current in Fig. 9 embodiment to be below described, also can solve with reference to same method for target tracking device 1120, and also similar for Figure 11 embodiment, do not repeat them here.
The first solution in current tracking device 120, increases a register for storing branch target BN, and do to this rear conditional branching the prediction that branch transition do not occur, and makes current tracking device 120 continue stepping along place track.If the branch transition of previous conditional branching does not occur, and correct for the prediction of a rear conditional branching, then current tracking device 120 continues stepping along place track.If the branch transition of previous conditional branching does not occur, but for the prediction error of a rear conditional branching, the then branch target BN that stores in a register before being resumed of the value of current tracking device 120 read pointer 115, and removing order front end streamline, make the stepping from the branch target tracing point of this rear conditional branching again of current tracking device 120.If the transfer of previous conditional branching occurs, then as precedent upgrades current tracking device 120 read pointer 115 and N+1 tracing point from the branch target of this previous conditional branching starts stepping.At this, buffering 1104 is read in instruction and level cache provides the process of instruction identical with precedent to processor core 1102 under current tracking device 120 controls with target tracking device 1120.
Second workaround is in processor core, increase more front end streamline.Such as, when there being four front end streamlines in processor core, and there are corresponding four cover tracking devices, read buffering and namely instruction type storer can realize performing the instruction on four individual paths corresponding to two-layer conditional branching simultaneously.Concrete grammar can be analogized according to embodiment before, is not repeating at this.According to technical solution of the present invention and design, other any suitable changes can also be had.To those skilled in the art, all these are replaced, adjust and improve the protection domain that all should belong to claims of the present invention.
According to technical solution of the present invention and design, other any suitable changes can also be had.To those skilled in the art, all these are replaced, adjust and improve the protection domain that all should belong to claims of the present invention.
The current line that in 410, tracking device 120 read pointer 115 points to.In the present invention, storer 426 not necessarily, but when storer 426 exists, once tracking device 120 read pointer 115 points to a new-track, namely the BN1X value of read pointer 115 changes, and the content of the row pointed to by described new BN1X in instruction type table 410 is read and is stored in storer 426.Afterwards, read pointer 115 only needs access storer 426 can read command adapted thereto type, reduces the number of times of access instruction type list 410 to reduce power consumption.
Like this, as before as described in embodiment, controlled the renewal of register 112 in tracking device 120 by stairstep signal 111, and BN1X and BN1Y on read pointer 115 be sent to level cache 104 respectively by bus 411 and 413 and obtain command adapted thereto and perform for processor core 102 through bus 103.BN1X in read pointer 115 is also sent to destination address table 412 by bus 411, and BN1Y is also sent to offset address mapping block 416 and storer 426 by bus 413.In offset address mapping block 416, mapped row being converted to destination address table 412 of this BN1Y number deliver to selector switch 418 in secondary tracking device 420 as an input by bus 419.
In secondary tracking device 420, the row number of first take-off point in destination address table 412 after the instruction of current processor core 102 execution are stored in register 432, these row number are delivered to destination address table 412 through secondary tracking device 420 column pointer 425 output and can be found corresponding list item in the row pointed to by the BN1X in bus 411, read content wherein and are sent to selector switch 118 through bus 423.Incrementer 414 in secondary tracking device 420 increases one to the row in register 432 number, obtains the row number of next take-off point when order performs, and delivers to selector switch 418 as another input.Register 422 acceptance derives from the TAKEN signal 113 of processor core 102 and is sent to selector switch 418 as control signal after the temporary clock period.Like this, once processor core 102 performs branch instruction generation branch transition, then in the next clock period, selector switch 418 selects the row output through being converted in bus 419 to be sent to register 432, otherwise selects the row output after deriving from the increasing one of incrementer 414 to be sent to register 432.
BN1Y bus 413 on read pointer 115 reads corresponding instruction type from storer 426, is sent to the write enable signal as register 432 after register in the secondary tracking device 420-424 temporary clock period through bus 421.If the value of this write enable signal is ' 0 ', represent that the instruction that current processor core is performing is not branch instruction, then the value (i.e. row number) of register 432 remains unchanged; If the value of this write enable signal is ' 1 ', represent that the instruction that current processor core is performing is branch instruction, then the value of register 432 is updated to the output of selector switch 418.Like this, when processor core performs branch instruction, the value of register 432 just correspondingly can be updated to the next one row on row corresponding to Branch Target Instruction BN1Y number or current orbit

Claims (40)

1. a caching method, is characterized in that, examines the instruction being filled into level cache, extracts corresponding command information;
According to described command information, the Branch Target Instruction of branch instructions all in level cache is prestored in L2 cache;
First read pointer is provided to perform for processor core to read corresponding instruction level cache addressing;
Perform the feedback information of instruction generation according to processor core, upgrade the value of the first read pointer;
When processor core performs branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in L2 cache; With
If branch transition occurs, the first read pointer is updated to the branch target addressable address value of described branch instruction; If branch transition does not occur, the first read pointer is updated to the addressable address value of the rear instruction that this branch instruction order performs.
2. caching method as claimed in claim 1, it is characterized in that, according to described command information, the Branch Target Instruction of the branch instruction that will be performed by processor core is in advance filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
3. caching method as claimed in claim 2, is characterized in that, examine the instruction being filled into level cache, extract corresponding command information; First read pointer is according to described command information but not the function of instruction itself determines how to upgrade.
4. caching method as claimed in claim 2, is characterized in that, when the first read pointer points to a branch instruction of having ready conditions, and one when being unconditional branch instructions thereafter, then according to the execution result of processor core to branch instruction of having ready conditions:
If branch transition occurs, the branch target addressable address value of branch instruction of having ready conditions described in the first read pointer is updated to; If branch transition does not occur, the first read pointer is updated to the branch target addressable address value of described unconditional branch instructions;
Processor core is made not need an independent clock period to perform described unconditional branch instructions.
5. caching method as claimed in claim 2, is characterized in that, cushion the value of the first read pointer, and is performed for processor core to read corresponding instruction level cache addressing by the first read pointer value after described buffering;
First read pointer points to branch instruction in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
6. caching method as claimed in claim 2, is characterized in that, provide second read pointer; Described second read pointer points to the branch instruction after the first read pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
7. caching method as claimed in claim 6, it is characterized in that, when processor core performs branch instruction, perform as subsequent instructions according in branch prediction selecting sequence execution next instruction and Branch Target Instruction, and preserve another addressable address;
If branch transition result is consistent with branch prediction, then continue to perform subsequent instructions;
If branch transition result and branch prediction inconsistent, then empty streamline, and re-execute from the instruction that the addressable address of described preservation is corresponding.
8. caching method as claimed in claim 2, it is characterized in that, described command information comprises the branch target addressable address of instruction type and branch instruction.
9. caching method as claimed in claim 8, is characterized in that, all corresponding instruction type of every bar instruction, and described instruction type is one or more position.
10. caching method as claimed in claim 8, is characterized in that, the corresponding branch target addressable address of every bar branch instruction.
11. caching methods as claimed in claim 10, it is characterized in that, instruction type can be broken into further fundamental type information and branch instruction type; Wherein:
Fundamental type information is distinguished branch instruction and non-branch instruction, all corresponding fundamental type information of every bar instruction;
Branch instruction type is distinguished further to branch instruction, the corresponding branch pattern information of every bar branch instruction.
12. caching methods as claimed in claim 11, it is characterized in that, described branch instruction type comprises: branch instruction of having ready conditions, unconditional branch instructions.
13. caching methods as claimed in claim 8, is characterized in that, find the instruction type of corresponding instruction according to the first read pointer; The branch target addressable address of Article 1 branch instruction after finding described instruction according to the second read pointer.
14. caching methods as claimed in claim 13, is characterized in that, obtain the second corresponding read pointer value according to the first read pointer value by mapping.
15. caching methods as claimed in claim 14, is characterized in that, the second read pointer value equals the branch instruction number before the instruction of the first read pointer sensing.
16. caching methods as claimed in claim 14, is characterized in that, when processor core performs branch instruction, if branch transition occurs, the first read pointer is updated to the branch target addressable address value that the second read pointer points to; If branch transition does not occur, the first read pointer is updated to the addressable address value of the rear instruction that this branch instruction order performs.
17. 1 kinds of caching systems, is characterized in that, comprising:
Processor core, for performing instruction;
Level cache, for the instruction that storage of processor core will perform;
L2 cache, for storing all instructions in level cache, and the Branch Target Instruction of all branch instructions in level cache;
Initiatively table, its list item is corresponding with L2 cache instruction block, for the address information of instruction block in store secondary buffer memory;
Block address mapping block, for storing the corresponding relation of level cache and L2 cache instruction address;
Track table, its tracing point is corresponding with level cache instruction, for storing the command information of instruction in level cache; Described command information comprises the Branch Target Instruction positional information of instruction type and branch instruction;
Scanner, for examining the instruction being filled into level cache, extracts corresponding command information, the Branch Target Instruction address of Branch Computed instruction, and the Branch Target Instruction address calculated initiatively is being shown and mated in block address mapping block; If mate unsuccessful, then at least one instruction comprising described Branch Target Instruction is filled into L2 cache from lower level external memory, and corresponding Branch Target Instruction positional information is stored in track table; If the match is successful, then direct Branch Target Instruction positional information is stored in track table;
Tracking device (tracker), exports the first read pointer and performs for processor core to read command adapted thereto level cache addressing, and read described command information from track table; Described tracking device performs the feedback information of instruction generation and the value of described first read pointer of described command information renewal according to processor core; If branch transition occurs, the first read pointer is updated to the branch target addressable address value of described branch instruction; If branch transition does not occur, the first read pointer is updated to the addressable address value of the rear instruction that this branch instruction order performs.
18. caching systems as claimed in claim 17, is characterized in that, the first read pointer is according to described command information but not the function of instruction itself determines how to upgrade.
19. caching systems as claimed in claim 17, is characterized in that, the described command information reading tracing point that the first read pointer points to simultaneously and store in a tracing point thereafter from track table.
20. caching systems as claimed in claim 19, is characterized in that, when the first read pointer points to a branch instruction of having ready conditions, and one when being unconditional branch instructions thereafter, then according to the execution result of processor core to branch instruction of having ready conditions:
If branch transition occurs, the branch target addressable address value of branch instruction of having ready conditions described in the first read pointer is updated to; If branch transition does not occur, the first read pointer is updated to the branch target addressable address value of described unconditional branch instructions;
Processor core is made not need an independent clock period to perform described unconditional branch instructions.
21. caching systems as claimed in claim 17, is characterized in that, also comprise an impact damper, described impact damper is for storing the value of the first read pointer;
The first read pointer value that described impact damper exports performs for processor core to read corresponding instruction level cache addressing;
First read pointer of tracking device points to branch instruction in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
22. caching systems as claimed in claim 17, is characterized in that, also comprise a secondary tracking device (slavetracker);
Described secondary tracking device exports the second read pointer, point to the branch instruction after the first read pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
23. caching systems as claimed in claim 22, is characterized in that, also comprise a register in described tracking device, perform an addressable address in next instruction and Branch Target Instruction for storage order;
When processor core performs branch instruction, perform as subsequent instructions according in branch prediction selecting sequence execution next instruction and Branch Target Instruction, and another addressable address is stored in described register;
If branch transition result is consistent with branch prediction, then continue to perform subsequent instructions;
If branch transition result and branch prediction inconsistent, then empty streamline, and instruction corresponding to the addressable address preserved from described register re-executes.
24. caching systems as claimed in claim 17, is characterized in that, in described track table every bar track last tracing point after increase again one terminate tracing point; The instruction type of described end tracing point is unconditional branch instructions, and its branch target addressable address is the addressable address that order performs next track first tracing point; When the first read pointer points to end tracing point, level cache exports dummy instruction.
25. caching systems as claimed in claim 24, is characterized in that, in described track table every bar track last tracing point after increase again one terminate tracing point; The instruction type of described end tracing point is unconditional branch instructions, and its branch target addressable address is the addressable address that order performs next track first tracing point; And
When to terminate the tracing point before tracing point be not take-off point, this can be terminated the instruction type of tracing point and branch target addressable address as the instruction type of this tracing point and branch target addressable address.
26. caching systems as claimed in claim 17, it is characterized in that, described track table comprises further:
Instruction type table, for storing fundamental type information corresponding to instruction; Described fundamental type information is distinguished branch instruction and non-branch instruction; A list item of the equal corresponding instruction type table of every bar instruction, and contents in table is one or more position; With
Destination address table, for storing branch instruction type corresponding to branch instruction and branch target addressable address; A list item of the corresponding destination address table of every bar branch instruction.
27. caching systems as claimed in claim 26, it is characterized in that, described branch instruction type comprises: branch instruction of having ready conditions, unconditional branch instructions.
28. caching systems as claimed in claim 22, is characterized in that, find the fundamental type information of corresponding instruction according to the first read pointer from instruction type table; Described fundamental type information is sent to tracking device; With
After finding described instruction according to the second read pointer from destination address table Article 1 branch instruction branch instruction type and branch target addressable address; Described branch instruction type is sent to tracking device, and described branch target addressable address is sent to secondary tracking device.
29. caching systems as claimed in claim 28, is characterized in that, also comprise an offset address mapping block, for the first read pointer value is mapped as corresponding destination address tabular number; Described destination address tabular number is sent to secondary tracking device.
30. caching systems as claimed in claim 29, it is characterized in that, described offset address mapping block comprises:
Code translator, for according to the first read pointer value, produces a mask value, and the instruction that described first read pointer is pointed to and mask value corresponding to instruction are thereafter ' 0 ', and other mask values are ' 1 ';
Mask device, carries out and operation for the fundamental type information in the mask value and instruction type list that produces code translator, obtains control word;
Selector switch array; In described selector switch array, every column selector is selected according to the value of control word corresponding positions; If this position is ' 0 ', then select the input in one's own profession prostatitis; If this position is ' 1 ', then select the input in descending prostatitis; Selector switch array is inputted unique one ' 1 ' to be equaled by the line number of above moving in control word ' 1 ' number, thus the first read pointer value is mapped as corresponding destination address tabular number.
31. caching systems as claimed in claim 29, is characterized in that, when processor core performs branch instruction, if branch transition occurs, the first read pointer of tracking device is updated to described branch target addressable address value; If branch transition does not occur, the first read pointer of tracking device is updated to the addressable address value of the rear instruction that this branch instruction order performs.
32. caching systems as claimed in claim 29, is characterized in that, occur if processor core performs branch instruction and branch transition, and the second read pointer of secondary tracking device is updated to the row number that offset address mapping block is sent here; If processor core performs branch instruction and branch transition does not occur, the second read pointer value of secondary tracking device is increased one, points to the next list item in destination address table.
33. caching methods as claimed in claim 1, is characterized in that, processor core comprises two front end streamlines and a backend pipeline; Described method also provides:
First instruction reads buffering, for storing present instruction block;
A third reading pointer is read to cushion addressing to read the order front end streamline execution of corresponding instruction for processor core to described first instruction; With
Described first read pointer supplies the target front end streamline of processor core to perform level cache addressing to read corresponding instruction.
34. caching methods as claimed in claim 33, is characterized in that, if the branch transition of branch instruction does not occur:
First read pointer value is updated to the branch target addressable address of next branch instruction of third reading pointed, make the first read pointer point to this Branch Target Instruction block in level cache, and read the target front end streamline execution of corresponding instruction for processor core; Third reading pointer continues to upgrade, and reads to read the order front end streamline execution of corresponding instruction for processor core buffering from the first instruction; With
If the branch transition of branch instruction successfully occurs:
The instruction block pointed to by first read pointer is filled into the first instruction from level cache and reads buffering, and third reading pointer value is updated to the value after the first read pointer increasing one; Third reading pointer upgrades from this value, and reads to read the order front end streamline execution of corresponding instruction for processor core buffering from the first instruction.
35. caching methods as claimed in claim 33, is characterized in that, provide:
Second instruction reads buffering, for storing target instruction target word block; First read pointer is read to cushion addressing to read the target front end streamline execution of corresponding instruction for processor core to described second instruction;
Second read pointer; Described second read pointer points to the branch instruction after the first read pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache; With
4th read pointer; Described 4th read pointer points to the branch instruction after third reading pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
36. caching methods as claimed in claim 37, it is characterized in that, when the 4th read pointer points to the Article 1 branch instruction after third reading pointer in advance, first read pointer value is updated to the branch target addressable address of the branch instruction of third reading pointed, make the first read pointer point to this Branch Target Instruction block in level cache, and read the target front end streamline execution of corresponding instruction for processor core.
37. caching systems as claimed in claim 17, is characterized in that, processor core comprises two front end streamlines and a backend pipeline; Described system also comprises:
First instruction reads buffering, for storing present instruction block;
A second tracking device, exports third reading pointer and reads to cushion addressing to read the order front end streamline execution of corresponding instruction for processor core to described first instruction; With
Described first read pointer supplies the target front end streamline of processor core to perform level cache addressing to read corresponding instruction.
38. caching systems as claimed in claim 37, is characterized in that, if the branch transition of branch instruction does not occur:
First read pointer value is updated to the branch target addressable address of next branch instruction of third reading pointed, make the first read pointer point to this Branch Target Instruction block in level cache, and read the target front end streamline execution of corresponding instruction for processor core; Third reading pointer continues to upgrade, and reads to read the order front end streamline execution of corresponding instruction for processor core buffering from the first instruction; With
If the branch transition of branch instruction successfully occurs:
The instruction block pointed to by first read pointer is filled into the first instruction from level cache and reads buffering, and third reading pointer value is updated to the value after the first read pointer increasing one; Third reading pointer upgrades from this value, and reads to read the order front end streamline execution of corresponding instruction for processor core buffering from the first instruction.
39. caching systems as claimed in claim 37, is characterized in that, also comprise:
Second instruction reads buffering, for storing target instruction target word block; First read pointer is read to cushion addressing to read the target front end streamline execution of corresponding instruction for processor core to described second instruction;
A secondary tracking device, exports the second read pointer; Described second read pointer points to the branch instruction after the first read pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache; With
A second secondary tracking device, exports the 4th read pointer; Described 4th read pointer points to the branch instruction after third reading pointer in advance, if the Branch Target Instruction of this branch instruction is not yet stored in level cache, then this Branch Target Instruction is filled into level cache from L2 cache, when making processor core perform this branch instruction, no matter whether branch transition occurs, and follow-uply the instruction be performed has been stored in level cache.
40. caching systems as claimed in claim 36, it is characterized in that, when the 4th read pointer points to the Article 1 branch instruction after third reading pointer in advance, first read pointer value is updated to the branch target addressable address of the branch instruction of third reading pointed, make the first read pointer point to this Branch Target Instruction block in level cache, and read the target front end streamline execution of corresponding instruction for processor core.
CN201410048036.7A 2013-12-24 2014-01-29 Cache system and method Pending CN104731718A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410048036.7A CN104731718A (en) 2013-12-24 2014-01-29 Cache system and method
PCT/CN2014/094603 WO2015096688A1 (en) 2013-12-24 2014-12-23 Caching system and method

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN2013107378134 2013-12-24
CN201310737813 2013-12-24
CN201410048036.7A CN104731718A (en) 2013-12-24 2014-01-29 Cache system and method

Publications (1)

Publication Number Publication Date
CN104731718A true CN104731718A (en) 2015-06-24

Family

ID=53455626

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201410048036.7A Pending CN104731718A (en) 2013-12-24 2014-01-29 Cache system and method
CN201410826711.4A Active CN104731719B (en) 2013-12-24 2014-12-23 Cache system and method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201410826711.4A Active CN104731719B (en) 2013-12-24 2014-12-23 Cache system and method

Country Status (2)

Country Link
CN (2) CN104731718A (en)
WO (1) WO2015096688A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729053A (en) * 2017-10-17 2018-02-23 安徽皖通邮电股份有限公司 A kind of method for realizing cache tables
CN107810476A (en) * 2015-06-26 2018-03-16 微软技术许可有限责任公司 Uncoupled processor instruction window and operand buffer
CN107851026A (en) * 2015-08-14 2018-03-27 高通股份有限公司 Power efficient, which obtains, to be adapted to
CN108139967A (en) * 2015-10-09 2018-06-08 华为技术有限公司 Stream compression is changed to array
CN109684236A (en) * 2018-12-25 2019-04-26 广东浪潮大数据研究有限公司 A kind of data write buffer control method, device, electronic equipment and storage medium
CN109964454A (en) * 2017-04-03 2019-07-02 株式会社东芝 Transfer station
CN110851182A (en) * 2019-10-24 2020-02-28 珠海市杰理科技股份有限公司 Instruction acquisition method and device, computer equipment and storage medium
CN111625280A (en) * 2019-02-27 2020-09-04 上海复旦微电子集团股份有限公司 Instruction control method and device and readable storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9588758B1 (en) 2015-12-18 2017-03-07 International Business Machines Corporation Identifying user managed software modules
CN107291920B (en) * 2017-06-28 2021-02-02 南京途牛科技有限公司 Air ticket query caching method
CN111290698B (en) * 2018-12-07 2022-05-03 上海寒武纪信息科技有限公司 Data access method, data processing method, data access circuit and arithmetic device
CN110187663B (en) * 2019-06-19 2020-11-03 浙江中控技术股份有限公司 Monitoring method and device
CN114780031B (en) * 2022-04-15 2022-11-11 北京志凌海纳科技有限公司 Data processing method and device based on single-machine storage engine
CN117193861B (en) * 2023-11-07 2024-03-15 芯来智融半导体科技(上海)有限公司 Instruction processing method, apparatus, computer device and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5913047A (en) * 1997-10-29 1999-06-15 Advanced Micro Devices, Inc. Pairing floating point exchange instruction with another floating point instruction to reduce dispatch latency
JP3983482B2 (en) * 2001-02-02 2007-09-26 株式会社ルネサステクノロジ PC relative branching with high-speed displacement
JP2008299795A (en) * 2007-06-04 2008-12-11 Nec Electronics Corp Branch prediction controller and method thereof
US20090204791A1 (en) * 2008-02-12 2009-08-13 Luick David A Compound Instruction Group Formation and Execution
US8527707B2 (en) * 2009-12-25 2013-09-03 Shanghai Xin Hao Micro Electronics Co. Ltd. High-performance cache system and method
US8516230B2 (en) * 2009-12-29 2013-08-20 International Business Machines Corporation SPE software instruction cache
CN102163143B (en) * 2011-04-28 2013-05-01 北京北大众志微系统科技有限责任公司 A method realizing prediction of value association indirect jump
CN102841865B (en) * 2011-06-24 2016-02-10 上海芯豪微电子有限公司 High-performance cache system and method
CN102968293B (en) * 2012-11-28 2014-12-10 中国人民解放军国防科学技术大学 Dynamic detection and execution method of program loop code based on instruction queue

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107810476B (en) * 2015-06-26 2021-02-23 微软技术许可有限责任公司 Decoupled processor instruction window and operand buffers
CN107810476A (en) * 2015-06-26 2018-03-16 微软技术许可有限责任公司 Uncoupled processor instruction window and operand buffer
US11048517B2 (en) 2015-06-26 2021-06-29 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
CN107851026A (en) * 2015-08-14 2018-03-27 高通股份有限公司 Power efficient, which obtains, to be adapted to
CN108139967A (en) * 2015-10-09 2018-06-08 华为技术有限公司 Stream compression is changed to array
CN109964454A (en) * 2017-04-03 2019-07-02 株式会社东芝 Transfer station
CN107729053B (en) * 2017-10-17 2020-11-27 安徽皖通邮电股份有限公司 Method for realizing high-speed cache table
CN107729053A (en) * 2017-10-17 2018-02-23 安徽皖通邮电股份有限公司 A kind of method for realizing cache tables
CN109684236A (en) * 2018-12-25 2019-04-26 广东浪潮大数据研究有限公司 A kind of data write buffer control method, device, electronic equipment and storage medium
CN111625280A (en) * 2019-02-27 2020-09-04 上海复旦微电子集团股份有限公司 Instruction control method and device and readable storage medium
CN111625280B (en) * 2019-02-27 2023-08-04 上海复旦微电子集团股份有限公司 Instruction control method and device and readable storage medium
CN110851182A (en) * 2019-10-24 2020-02-28 珠海市杰理科技股份有限公司 Instruction acquisition method and device, computer equipment and storage medium
CN110851182B (en) * 2019-10-24 2021-12-03 珠海市杰理科技股份有限公司 Instruction acquisition method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN104731719B (en) 2020-04-28
WO2015096688A1 (en) 2015-07-02
CN104731719A (en) 2015-06-24

Similar Documents

Publication Publication Date Title
CN104731718A (en) Cache system and method
CN104679480A (en) Instruction set transition system and method
CN102841865B (en) High-performance cache system and method
CN102110058B (en) The caching method of a kind of low miss rate, low disappearance punishment and device
CN104978282B (en) A kind of caching system and method
CN104679481A (en) Instruction set transition system and method
CN104424129A (en) Cache system and method based on read buffer of instructions
CN102117198B (en) Branch processing method
CN102306093B (en) Device and method for realizing indirect branch prediction of modern processor
CN104050092B (en) A kind of data buffering system and method
CN103513957A (en) High-performance cache system and method
CN104424128A (en) Variable-length instruction word processor system and method
CN104424158A (en) General unit-based high-performance processor system and method
CN102855121B (en) Branching processing method and system
CN103620547A (en) Guest instruction to native instruction range based mapping using a conversion look aside buffer of a processor
CN103984526B (en) A kind of instruction process system and method
CN103513958A (en) High-performance instruction caching system and method
US10275358B2 (en) High-performance instruction cache system and method
CN104657285A (en) System and method for caching data
CN106066787A (en) A kind of processor system pushed based on instruction and data and method
US11301250B2 (en) Data prefetching auxiliary circuit, data prefetching method, and microprocessor
CN103176914A (en) Low-miss-rate and low-wart-penalty caching method and device
CN109783143A (en) Control method and control equipment for instruction pipeline stream
CN104424132A (en) High-performance instruction cache system and method
TWI636362B (en) High-performance cache system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150624

WD01 Invention patent application deemed withdrawn after publication