CN103176914B - The caching method of a kind of low miss rate, low disappearance punishment and device - Google Patents

The caching method of a kind of low miss rate, low disappearance punishment and device Download PDF

Info

Publication number
CN103176914B
CN103176914B CN201210450909.8A CN201210450909A CN103176914B CN 103176914 B CN103176914 B CN 103176914B CN 201210450909 A CN201210450909 A CN 201210450909A CN 103176914 B CN103176914 B CN 103176914B
Authority
CN
China
Prior art keywords
branch
instruction
track
point
memory
Prior art date
Application number
CN201210450909.8A
Other languages
Chinese (zh)
Other versions
CN103176914A (en
Inventor
林正浩
Original Assignee
上海芯豪微电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201110376514 priority Critical
Priority to CN2011103765143 priority
Priority to CN201110376514.3 priority
Application filed by 上海芯豪微电子有限公司 filed Critical 上海芯豪微电子有限公司
Priority to CN201210450909.8A priority patent/CN103176914B/en
Publication of CN103176914A publication Critical patent/CN103176914A/en
Application granted granted Critical
Publication of CN103176914B publication Critical patent/CN103176914B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30058Conditional branch instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • G06F9/3806Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6028Prefetching based on hints or prefetch instructions

Abstract

The caching method of a kind of low miss rate, low disappearance punishment and device, can be before processor core performs to instruct when being applied to field of processors, described instruction is filled in the high-speed memory that described processor nuclear energy directly accesses, almost make described processor core can get the instruction of needs in described high-speed memory every time, reach high cache hit rate.

Description

The caching method of a kind of low miss rate, low disappearance punishment and device

Technical field

The present invention relates to integrated circuit and computer realm.

Background technology

The effect of caching (cache) is that a part of content in internal memory (main memory) is replicated in caching In, make these contents quickly to be accessed by processor core (CPU core), to ensure streamline at short notice Continuous service.

The addressing of existing caching is all based in the following manner, first reads with index (Index) segment addressing in address Go out the labelling (tag) in mark memory (tag memory) simultaneously with index segment in address and block intrinsic displacement Content in section (offset) addressing reading content memorizer (instruction or data memory) jointly, If the labelling read mates with the marker field in address, thinking cache hit (hit), reading content is effective.Instead Then think cache miss (miss), read content invalid.To multichannel set associative, aforesaid operations is to each road Group is carried out parallel simultaneously, detects which road group hit, and the content that hit road group reads is effective content.If all Disappearance (miss), reads content invalid.After cache miss, cache control logic starts to take from subordinate's memorizer Content (fill after miss).

Cache miss can be divided three classes situation.One for starting disappearance (compulsory miss), and two is group conflict Disappearance (conflict miss), three lack (capacity miss) for capacity limit.During existing buffer structure runs In addition to prefetching successful fraction, start disappearance inevitable, moreover prefetch and also have no small cost;Multichannel Set associative is limited by power consumption and speed limits and is difficult to exceed some (because the requirement of set associative buffer structure will All groups are read simultaneously by a plurality of contents and the mark of same indexed addressing and are compared), volume is because to coordinate The CPU core speed of service is also difficult to scale up.So there being the setting of multi-level caching, the caching higher level of low layer The big speed of buffer memory capacity is slow.

The plural Caching hierarchies that modern caching is generally connected by plural number road group is constituted.Although there being sacrifice caching (Victim Cache), trace cache (Trace cache), and prefetch (Prefectch) and (take a caching During block, next cache blocks is also fetched and be placed on cache buffer (cache buffer) or use prefetched instruction) Deng invention, because the processor day by day expanded/memory speed wide gap, in current system structure, cache miss is still It it is the severe bottleneck of restriction modern processors performance boost.

The invention discloses brand-new method and apparatus and fundamentally solve above-mentioned difficulties.

Summary of the invention

The present invention proposes a kind of method run for secondary processor core, and described processor core connects one Comprise the first memory of executable instruction and one more faster second memory than first memory speed.Institute The method of stating includes that aligning the instruction by being filled into second memory from first memory examines, thus extracts Go out at least to include the command information of branch information;Command information according to extracting sets up plurality of tracks;Root According to one or more track in plural number bar instruction track, at least one or more instruction is being held by processor core It is filled into second memory from first memory so that processor nuclear energy obtains described from second memory before Hang At least one or a plurality of instruction.Described method also comprises including the instruction that processor will perform include prefetching The instruction segment that at least branch target of two-layer branch instruction is corresponding.

The invention allows for a kind of method run for secondary processor core, described processor core connects one The individual first memory comprising executable instruction and one more faster second memory than first memory speed, And this first memory connects threeth memorizer slower than first memory speed.Described method includes Align and be filled into the instruction of memorizer on from first memory and examine, thus extract and at least include The command information of branch information;Plural number bar one-level track is set up according to the command information extracted;According to plural number In bar one-level track one or more track will at least one or more instruction by processor core perform before from First memory is filled into second memory so that processor nuclear energy is from described in second memory acquisition at least one Bar or a plurality of instruction;Align the instruction by being filled into first memory from the 3rd memorizer to examine, thus Extract the command information at least including branch information;Plural number bar two grades is set up according to the command information extracted Track;According to one or more track in plural number bar secondary track, at least one or more instruction is being located Reason device core is filled into first memory from the 3rd memorizer before performing so that at second memory to the first storage Before device acquisition request at least one or a plurality of instruction, first memory can to second memory fill described in extremely Few one or more instruction.

The invention allows for a kind of for assisting the method that in multiple nucleus system, multiple processor cores run, described Multiple nucleus system comprises a first track table system, and in described first track table system, processor core connects one Individual first memory and a second memory;One the second track table system, in described second track table system In system, processor core connects a first memory and a second memory;And a connection described first First memory in track table system and the 3rd storage of the first memory in described second track table system Device.Described method includes, for described first track table system and the second track table system, just being deposited from first Reservoir is filled into the instruction of second memory and examines, thus extracts the instruction at least including branch information Information;For described first track table system and the second track table system, the command information according to extracting is built Vertical plural number bar one-level track;For described first track table system and the second track table system, according to described multiple One or more track in several one-level tracks instructs at least one or more before being performed by processor core Be filled into second memory from first memory so that processor nuclear energy from second memory obtain described at least One or more instruction;Align the instruction by being filled into first memory from the 3rd memorizer to examine, from And extract the command information at least including branch information;Plural number bar two is set up according to the command information extracted Level track;According to one or more track in plural number bar secondary track by least one or more instruction at quilt Processor core in described first track table system or the second track table system is filled from the 3rd memorizer before performing To first memory so that the second memory in described first track table system or the second track table system Before first memory acquisition request at least one or a plurality of instruction, first memory can be to second memory Fill described at least one or a plurality of instruction.

Beneficial effect:

The invention discloses the difficulty that brand-new method and apparatus fundamentally solves to propose in background technology.With The principle starting to take from low speed storage content (fill after miss) after existing buffer structure disappearance is entirely different It is: the present invention uses the method and apparatus carrying out Branch Target Instruction segment prefetching based on track table, at processor Core has obtained from the slow memory of even lower level or has had started to be filled into this instruction before going to certain section of instruction In high-level command memory.Caching system of the present invention is integrated with the mechanism of prefetching, it is to avoid traditional caching The process of coupling label.

The multilamellar Branch Target Instruction segment prefetching method and apparatus that the present invention proposes, earlier having prefetched may quilt The instruction used, thus cover the response time needed for instruction segment is filled.

Additionally, in order to eliminate because of multilamellar, to prefetch the track table capacity that may bring excessive, and the present invention proposes rail Road gauge pressure compression method, reduces the capacity of track table, makes leading pointer can move quickly, further simultaneously Prefetch the instruction that may be used in advance, and preferably cover the response time needed for instruction segment is filled.Additionally, In order to eliminate because prefetching may bring and instruction buffer data contamination problem, the invention allows for branch point The method of cutting, prefetches the instruction segment being likely used, but the instruction segment being only used being bound to is deposited Store up in caching, decrease caching write and the number of times replaced, reduce further the merit of whole caching system Consumption.

Finally, the caching system prefetched based on track table that the present invention proposes can be simultaneously to multi-level buffer structure Carry out required instruction to search, compared with tradition caching, decrease multi-level buffer and lack the accumulated delay caused. Further it is proposed that the caching system prefetched based on track table can also be applied in multiple nucleus system.

Accompanying drawing explanation

Fig. 1 is the exemplary embodiments that program performs.

Fig. 2 is the processor environment carrying out instruction prefetch based on track table of the present invention.

Fig. 3 is the embodiment of a track table handling as disclosed in the present invention.

Fig. 4 shows the embodiment of gauge pressure shrinking structure of the present invention.

Fig. 5 is according to the processor carrying out two-layer Branch Target Instruction segment prefetching based on track table of the present invention Environment.

Fig. 6 (a) shows pointer register embodiment of respective branches point after every time updating.

Fig. 6 (b)~(e) show the process of branch point cutting of the present invention.

Fig. 6 (f) shows pointer register another embodiment of respective branches point after every time updating.

Fig. 7 (a) is to carry out the another of two-layer Branch Target Instruction segment prefetching according to of the present invention based on track table One processor environment.

Fig. 7 (b) is the processor environment realizing branch point clipping function of the present invention.

Fig. 8 is the processor environment comprising two-level cache based on track table system.

Fig. 9 is the embodiment compressed L2 cache track table further.

Figure 10 (a) shows the processor environment comprising two-level cache based on compression track table.

Figure 10 (b) shows the processor environment only comprising level cache of a shared compression track table.

Figure 11 is the instruction prefetching techniques based on track table embodiment under polycaryon processor environment.

Figure 12 is another embodiment being compressed secondary track table content.

Detailed description of the invention

Below in conjunction with the accompanying drawings the embodiment of the present invention is implemented and illustrate.

Fig. 1 is the exemplary embodiments 1000 that program performs.Owing to program existing branch instruction, and can Determine continue executing with in order or transfer to Branch Target Instruction with the judged result according to branch's jump condition Perform next instruction, therefore the possible path that program performs can be expressed as the form of binary tree.

Instruction as it is shown in figure 1, program 1000 is represented as binary tree form, in its cathetus representation program Section, thick point represents branch instruction (i.e. branch point).Each branch point has two subsequent path, left hand path generation Sequential instructions (when branch's transfer does not occurs) after this branch instruction of table, right hand path represents branch target Instruction (when branch's transfer occurs).Such as, when performing first instruction segment 1001 and arriving branch point A, If branch's transfer does not occurs, then perform using instruction segment 1003 as the subsequent instructions section of branch point A, as Really branch's transfer occurs, then perform using instruction segment 1005 as Branch Target Instruction.

Similar, when performing instruction segment 1003 and arriving branch point B, there are two individual path 1007 Hes 1009;When performing instruction segment 1005 and arriving branch point C, there are two individual paths 1011 and 1013.Point Fulcrum B and C belongs to the branch point of same level.

Additionally, when performing instruction segment 1007 and arriving branch point D, have two individual paths 1015 and 1017; When performing instruction segment 1009 and arriving branch point E, there are two individual paths 1019 and 1021;When performing to refer to When the section of order 1011 arrives branch point F, there are two individual paths 1023 and 1025;When performing instruction segment 1013 When arriving branch point G, there are two individual paths 1027 and 1029.Branch point D, E, F and G belong to same The branch point of one level.Correspondingly, instruction segment 1015,1017,1019,1021,1023,1025, Branch point H, I, J, K, L, M, N and P on 1027 and 1029 falls within the branch point of same level. So, Fig. 1 shows all three layers of branch points.Other levels and structure can also be used.

Fig. 2 is the processor environment 2000 carrying out instruction prefetch based on track table of the present invention.Such as Fig. 2 Shown in, processor environment 2000 comprises high-level memorizer 124 and of a low level memorizer 122, One processor core 125.Additionally, processor environment 2000 comprises 123, master of a filling/maker Dynamic table 126, tracking device 170 of 121, track table and branch's decision logic 210.Should manage Solving, the purpose listing various parts here is for the ease of describing, it is also possible to comprise miscellaneous part, and Some parts can also be omitted.Additionally, the most only read operation has been described in detail, the feelings of write operation Condition is also similar with the situation of read operation.Here various parts can be distributed in multiple system, can be thing Present on reason or virtual, can be that hardware realizes (such as: integrated circuit), software realizes or by hardware Realize with combination of software.

High-level memorizer 124 and low level memorizer 122 can comprise any suitable storage device, such as: Static memory (SRAM), dynamic memory (DRAM) and flash memories (flash memory). Here, the level of memorizer refers to the degree of closeness between memorizer and processor core.Closer to process The level of device core is the highest.Additionally, high-level memorizer generally fast than the memory speed of low level but Capacity is little.High-level memorizer 124 can cache work as the one of system, maybe when there being other to deposit Time work as level cache, and can be divided into and a plurality of be referred to as being used for of block (such as memory block) The stored fragments of the data (such as: the instruction in instruction block) that storage processor core 125 is to be accessed.

Processor core 125 can be to be any suitable can run and with the place of caching system with streamline form Reason device.Processor core 125 can use independent instruction buffer and data buffer storage, it is possible to have some for The instruction of caching.When processor core 125 perform one instruction time, processor core 125 firstly the need of from Memorizer reads in instruction.Actively table 121, track table 126, tracking device 170 and filling/maker 123 are used In the instruction that processor core 125 will go to being filled in high-level memorizer 124, make processor core 125 can read required instruction with low-down cache miss rate from high-level memorizer 124.In this enforcement In example, term " is filled " and is represented the memorizer that instruction moves to from the memorizer of lower level higher level In, term " memory access " represents that processor core 125 is to (the most high-level storage of immediate memorizer Device 124 or level cache) read or write.This filling can be held independent of the instruction of processor core 125 Row history and carry out (i.e., it is not necessary to query statement perform history).

Filling/maker 123 can be according to suitable address acquisition instruction or instruction block, it is possible to from low layer External memory 122 obtains and examines with each the instruction being filled in high-level memorizer 124, And extract some information, such as: the branch target information of instruction type, instruction address and branch instruction.Should Instruction and the information being extracted comprising branch target information are used for calculating address and delivering to other moulds Block, such as active table 121 and track table 126.A branch instruction or branch point refer in the present embodiment It is that any suitable processor core 125 change that can cause performs stream (such as: non-one instruction of execution in order) Instruction type.

If the instruction block that branch target information is corresponding is not yet filled in high-level memorizer 124, then While being filled in high-level memorizer 124 by this instruction block, set up corresponding track.Track table 126 In track and memory block one_to_one corresponding in high-level memorizer 124, and all pointed to by same pointer 152. So, any instruction that processor core 125 will perform, high-level depositing can be filled into before execution In reservoir 124.

Filling/maker 123 can determine address information, such as according to instruction and branch target information: instruction class Type, branch's source address and branch target address information etc..Such as, instruction type can include that conditional branching refers to Make, unconditional branch instructions and other instructions etc..Classes of instructions can also include the subclass of conditional branch instructions Not, as branch's transfer time equal, more than time branch transfer etc..In some cases, it is believed that unconditional Branch instruction is a kind of special case of conditional branch instructions, i.e. condition is always set up.Branch's source address can refer to point The address of Zhi Zhiling itself, branch target address can refer to the address being transferred to when branch successfully occurs. In addition, it can include other information.

Furthermore, it is possible to set up a track table for providing address to fill high level based on precalculated information External memory 124.Fig. 3 is the embodiment 3000 of a track table handling as disclosed in the present invention.

As it is shown on figure 3, track table 126 and tracking device 170 interact to provide caching and branch process institute The address needed.

Track table 126 can comprise the track of the instruction that processor core 125 performs, and tracking device 170 is based on track Table 126 provides different addresses, and provides a reading pointer for track table 126.Track said herein The meaning is a kind of expression-form to a series of instructions (such as one instruction segment) to be performed.This table The form of reaching can include any suitable data type, such as address, block number, or other numeral.Additionally, When a track comprises a branch point, when this branch point has the branch target that changes program flow, Or when another instruction after an instruction is at a different instruction block, be such as in next instruction block An instruction or when an abnormal program or another one program threads etc., a new rail can be set up Road.The number of instructions that described a series of instruction comprises can be identical, it is also possible to different (such as in elongated instruction In the case of collection).

Track table 126 can include plurality of tracks, in 126 kinds of every tracks of its middle orbit table and track table The a line indicating a line number or block number (BN) has corresponding relation, and this block number points to a corresponding storage Device block.Article one, track can include a plurality of tracing point, and a tracing point can correspond to one or more finger Order.Additionally, due to a track is corresponding to a line in track table 126, therefore a tracing point corresponds to A list item (such as one memory element) of a line in track table 126.So, total in a track Tracing point can be equal to the list item sum in a line in track table 126.Also other organizational form can be used.

One tracing point (that is, in the item of table) can contain the information of an instruction in this track, Such as branch instruction.So, the content of a tracing point can comprise the classification information of command adapted thereto, branch The information such as target information.Can be by checking the content of a tracing point, based on branch target address therein May determine that a branch target point.

Such as, as it is shown on figure 3, processor core 125 can read behaviour by the instruction address of a kind of (M+Z) position The instruction of effect, is integer at this M Yu Z.M bit position in address can be referred to as high address, Z Bit position is referred to alternatively as offset address.Track table 126 can comprise 2MOK, the most altogether 2MIndividual track, and M The high address of position can be used for the addressing of track table 126.Every a line can comprise 2ZIndividual track item, the most always Number is 2ZIndividual tracing point, offset address (Z position) can be used for addressing in corresponding row determining one specific Tracing point (list item).

Additionally, the content format of each list item in Hang or tracing point can include a category portion 57, Individual XADDR part 58 and a YADDR part 59.Also other parts can be comprised.Category portion 57 Represent the classification of tracing point correspondence instruction.As it was previously stated, classes of instructions can include conditional branch instructions, non-bar Part branch instruction and other instruction.Classes of instructions can also include the subclass of conditional branch instructions, as equal Time branch transfer, more than time branch transfer etc..XADDR part 58 can comprise M bit address, it is also possible to It is referred to as a flat address or the referred to as first address.YADDR part 59 can comprise ground, Z position Location, it is also possible to be referred to as two-dimensional address or the referred to as second address.

When a new-track comprising a branch point (branch's tracing point) is established, this new-track Can be based upon in an available rows of track table 126, and branch's tracing point can be based upon this row In one available list item.The position of this row and this list item is true by the source address (i.e. branch's source address) of branch point Fixed.For example, it is possible to determine this line number code or block number according to the high address of branch's source address, according to branch The offset address of source address determines list item.

Additionally, the content of new tracing point can be with respective branches target instruction target word.In other words, branch tracing point Content stores branch target address information.Such as, the correspondence in track table 126 refers to a branch target The line number of the specific row of order or block number are by the content storing branch's tracing point as the first address 58.This Outward, offset address represents Branch Target Instruction position in its track, and this offset address is by as second Address 59 stores in the content of this branch's tracing point.So, in the tracing point content of branch point, first Address XADDR part 58 is used as row address, and the second address YADDR part 59 is used as row ground Branch target tracing point in this row is addressed by location (i.e. side-play amount).

Command memory 46 can be a part for high-level memorizer 124, is used for instructing access, it is possible to It is made up of the high-performance memory of any appropriate.Command memory 46 can comprise 2MIndividual memory block, each Memory block comprises 2ZIndividual byte or word.In other words, command memory 46 can store all by M and Z The instruction that position (i.e. instruction address) addresses so that this M position can be used for certain specific memory block, And this Z position can be used for being addressed the specified byte in this particular memory block or word.

Tracking device 170 can be made up of various parts or equipment, such as: depositor, selector, stack and/or its His memory module, for determining next track that processor core 125 performs.Tracking device 170 can be according to rail Current orbit in road table 126, tracing point information and whether occur branch to turn because of the execution of processor core 125 The information such as shifting determine next track.

Such as, in running, when processor core 125 performs branch instruction, bus 55 delivers (M+Z) bit instruction address of branch instruction.M bit address by bus 56 by as the first address or XADDR (or X address) delivers to track table 126, Z bit address by bus 53 by as the second address or YADDR (or Y address) delivers to track table 126.According to this first address and the second address, track table 126 A branch instruction entries can be found, and the branch target address of this branch instruction is exported in bus 51.

If branch's jump condition of this branch instruction is false, then branch's transfer does not occurs, by processor Core 125 is sent " branch is unsuccessful " signal here and is controlled, and selector 49 selects the YADDR warp in bus 53 Increase and after a logic 48 increases by one (' 1 ') individual byte or word, obtain the second new address 54, and in bus 52 Export this new address.Depositor 50 keeps the first address constant, by increasing a logic 48 constantly to the second address Increase one (' 1 ') until the next branch instruction pointed in current orbit table row.Now, current first address It is stored in depositor 50 with the second address, and is sent in bus 55.

On the other hand, if branch's jump condition of described branch instruction is set up, then branch's transfer occurs, Being sent here " branch's success " signal by processor core 125 control, selector 49 selects the branch in bus 51 In the content of the track list item that point is corresponding, the branch target address of storage is delivered in bus 52 as output.According to The control signal 60 (such as: a successful branch transfer) of processor core 125, depositor 50 keeps should First address of the corresponding new-track after change, and the new address of (M+Z) position is provided in bus 55.

So, in order to carry out the addressing of memorizer 46, track table 126 and tracking device 170 provide a block Address 56, and processor core 125 provides only a side-play amount.The instruction of processor core 125 feedback branch performs State makes tracking device 170 can carry out judging operation.

Before performing a new-track, the instruction block corresponding to this track is filled in command memory 46. Repeat all instructions that this process can make processor core 125 to perform all without cache miss occurs.

Returning to Fig. 2, for raising efficiency and reduce memory span, actively table 121 can be used for storage and appoints The orbit information what has been set up, and set up the mapping pass between address (or part in address) and block number System so that any available rows in track table 126 can be used to set up track.Such as, when setting up a rail During road, the address information of track is stored in active table 121.So, actively table 21 can store journey The map information of the track of all branch target tracing points in sequence.Other configuration structure can also be used.

So, actively table 121 can be used for storing the block address of instruction block in high-level memorizer 124. A block number (BNX) in each effective block address respective carter table 126.In checking process, permissible The block number of branch target address is obtained with list item in actively table 121 by the way of address is mated.? Join successful result, i.e. block number can be used in a line in addressing tracks table and high-level memorizer 124 An instruction block.Block BNX can also with instruction side-play amount in track (aforesaid second address or BNY) together for determining the position of tracing point.

If mating unsuccessful, represent that track corresponding to this address is not yet set up.One is specified by active table 121 Instruction segment corresponding for this address is also filled in high-level memorizer 124 by the position of this block number index by individual block number In putting.Meanwhile, in track table 126, a new-track of this block number corresponding is set up so that actively Table 121 can represent this track set up and relative address.Therefore, the most described active table 121 He The operation of filling/maker 123 can be before branch point be obtained by processor core 125 and performs, by branch The instruction segment that the Branch Target Instruction of point is corresponding is filled in high-level memorizer 124.

So, track table 126 may be configured to a kind of bivariate table, wherein, by the first address BNX to often One line index, a corresponding memory block or a storage line, and the second address BNY is to each column index, right Answer command adapted thereto side-play amount in memory block.In simple terms, the seedbed of the write address correspondence instruction of track table Location.Additionally, for a specific branch source address, active table 121 distribute one according to high address BNX, and BNY is equal to side-play amount.Then, BNX and BNY just can form a sensing and write table The write address of item.

Additionally, when instruction is filled in high-level memorizer 124, can be by calculating branch instruction ground The method of the finger offsets amount sum of location and Branch Target Instruction obtains the branch target ground of all branch instructions Location.This branch target address (high address, side-play amount) is sent to active table 121 with to high address portion Divide and mate, and actively table 121 can distribute a BNX.This BNX being assigned to from maker The instruction type of 123 and side-play amount (BNY) constitute each branch instruction track contents in table together.Should Content is stored in the branch point addressed by corresponding write address.

Additionally, tracking device 170 can be used for providing a read pointer 151 to track table 126.Read pointer 151 can also be the form of BNX and BNY.The content of the track list item pointed to by read pointer and this list item BNX and BNY (source BNX and source BNY) together with read and checked by tracking device 170.Tracking Device 170 can carry out multiple different read pointer according to this content and update operation.Such as, if this list item not Being a branch point, tracking device 170 can be by new BNX=source BNX, the method for new BNY=source BNY+1 Update read pointer.

If this list item is a conditional branching, tracking device 170 waits this branch that processor core 125 is sent here The control signal (TAKEN) produced when the branch instruction of point is performed.If this control signal shows that branch turns Move and do not occur, tracking device 170 can with new BNX=source BNX, new BNY=source BNY+1 method more New read pointer.But, if this branch successfully occurs, tracking device 170 can with new BNX=target BNX, The method of new BNY=target BNY updates read pointer.

If this list item is a unconditional branch (or redirecting), tracking device 170 can be regarded as a bar The conditional branching that part is set up, namely when this branch instruction is performed by new BNX=target BNX, newly The method of BNY=target BNY updates read pointer.

Tracking device 170 realizes operation based on track together with track table 126 and active table 121.Owing to dividing Zhi Zhiling, Branch Target Instruction and the address information immediately following the instruction after branch instruction can be determined in advance, Before processor core 125 performs instruction, these instructions can be filled in high-level memorizer 124.

Additionally, track table 126 and/or actively table 121 can also be compressed to save the memory space of each table. Fig. 4 shows the embodiment 4000 of gauge pressure shrinking structure of the present invention.

As shown in Figure 4, track table 126 can comprise 1262, mapping table 156 of track table of a compression With a mark table 153.Track table 1262 can store branch target information, but list item is than original track table Much less.And the list item in track table 1262 can have the structure identical with list item in original track table 126. Mark table 153 is to store finally to be written of the table of the number of list item in track table 1262.Table 1262,156 With 153, there is same number and row in correspondence with each other.

Additionally, the often row in track table 1262 has one more than or equal to the branch instruction that often row may have The maximum number of columns (as shown 4 row).Mapping table 156 can have one corresponding to this row The columns (6 row as shown) of the total number of instructions comprised.Mapping table 156 can also have a correspondence The columns that the total number of instructions comprised in this row adds 1 so that mapping table 156 adds ending row (such as figure ' J ' of middle display).So, row initial at subsequent instructions place can often be jumped at the end of row.Additionally, Mark table 153 only string.These three tables constitute track table 126 together.

In operation, when filling a tracing point in track table 1262, an external source provides one Row address BNX, column address BNY and content (such as: branch target information).Described BNX by with In selection table 1262,156 and 153 go together mutually.List item in table 153 is used for from track table 1262 Select a list item with storage track point content.

So, available list item place in the row addressed by BNX during mark table 153 indicates track table 1262 Column address, and described tracing point content is stored in track table 1262 the available list item being positioned at these these row of row In.This column address is different from BNY, can be referred to as BNY or MBNY after mapping.Such as in Fig. 4 Second behavior ' 2 ' of mark table 153, represents that the secondary series in the second row of track table 1262 is to be write Entering the available list item of new tracing point content, MBNY here is exactly ' 2 '.

Meanwhile, BNY is used for from same a line of mapping table 156 selecting row, mapping table 156 In the list item of these these row of row be written in described MBNY, i.e. track table 1262 and store described tracing point content The column address (such as ' 2 ' in above-mentioned example) of row.As such, it is possible to by mapping table 156 to BNY and MBNY Between mapping relations safeguard.Additionally, the contents in table in mapping table 153 can be increased ' 1 ' with The next one in Compass 1262 can use list item.

During read operation, row address BNX and column address BNY are used for from track table 126 reading The content (such as: branch target information) of one list item.BNX is used for from table 1262,156 selecting a line, And BNY is used for from mapping table 156 selecting row.

Table 156 is read, in this content Compass 126 by the content of the list item of BNX and BNY instruction The row indicated by MBNY.So, track table 1262 can just be accessed according to described BNX and MBNY In list item and read the content in this list item.Branch instruction generally accounts for the 1/6 of total number of instructions, therefore track table The entry number of 1262 can substantially reduce.Owing to the list item in table 156 has only to preserve a simple content (such as a column number or MBNY), therefore table 156 can take minimum memory space.So, described pressure Contracting track list structure can significantly reduce the track table handling requirement to memory space.Additionally, due to read pointer is straight Connecing from a branch point to next branch point, therefore read pointer can also move more quickly than.

Additionally, return to Fig. 2, in operation, filling/maker 123, track table 126, tracking device 170 If can be used for being filled in high-level memorizer 124 dried layer instruction segment with active table 121.Here institute The level stated can refer to the level of branch point as shown in Figure 1, it is also possible to is referred to as looking forward to the prospect (lookahead) The degree of depth (depth) or the degree of depth of instruction stuffing operations.

According to the method for the invention, once can set up more according to the prediction degree of depth of predetermined branch instruction Track, fills more instruction segment, thus preferably covers the response time from hierarchy storage fetching.

In this manual, the Article 1 after ' ground floor branch instruction ' refers to the instruction being currently executing Branch instruction, ' ground floor branch point ' refers to the branch point that ' ground floor branch instruction ' is corresponding;' second Layer branch instruction ' refer to the instruction being currently executing after Article 1 branch instruction thereafter order perform refer to Respective Article 1 branch instruction in the section of order and Branch Target Instruction section thereof, ' second layer branch point ' refers to The branch point that ' second layer branch instruction ' is corresponding.So, two-layer branch instruction includes three branch instructions altogether: One ground floor branch instruction and two second layer branch instructions.

The most in FIG, ground floor branch point is A, and second layer branch point is B and C.For ground floor Branch point A, the instruction segment 1003 and 1005 corresponding to B and C is pre-fetched in high-level memorizer 124. For second layer branch point B and C, instruction segment 1007,1009,1011 and 1013 is pre-fetched into high-level In memorizer 124.So, if the degree of depth of prediction is two-layer, instruction segment 1003,1005,1007,1009, 1011 and 1013 are all pre-fetched in high-level memorizer 124.According to the most described technical scheme, if for Above-mentioned instruction segment has stored in the instruction segment in high-level memorizer 124, can no longer prefetch, and only Prefetch the instruction segment being not already stored in high-level memorizer 124.Fig. 5 is of the present invention based on track table Carry out two-layer Branch Target Instruction segment prefetching logic 5000.

As it is shown in figure 5, prefetch logic 5000 to comprise track table 7126, tracking device 170 and many pointer addressings dress Put 7001.Additionally, track table 7126 can use the compaction table structure in Fig. 4, i.e. to track table 7126 In each effective tracing point respective branches tracing point that arbitrarily effectively track addressing order reads.In order to just In description, omit command memory 46 and processor core 125.

Many pointer addressings device 7001 comprise from increase device 5003 and 7005, pointer memory 5005,5007, 5009 and 5011, MUX 7013 and 7015 and branch's decision logic 5015.Pointer memory 5005, 5007,5009 it is used for storing four that instruct the second layer corresponding to beginning from being currently executing with 5011 Branch instruction.

Increasing one logic 48 being similar in Fig. 3 from the effect increasing device 5003 and 7005, is respectively used to two groups Second address (BNY) of each pointer in pointer memory (that is: 5005 and 5007,5009 and 5011) Increase one until next branch instruction tracing point in same track.MUX 7013 and 7015 is respectively For respectively selecting a pointer, to track table 7126 from pointer memory 5005 and 5007,5009 and 5011 Addressing.Branch's transfer that processor core is sent here by branch's decision logic 5015 occurs signal to process, decoding Obtain controlling for simultaneously the write signal that four pointer memories write, and control MUX 7013 He The selection signal of 7015.

If additionally, that read from track table 7126 in bus 5021 is the BN of target trajectory point, then selecting Device 5025 selects the input from bus 5021, is directly stored in pointer memory 5009 by this BN; If that read from track table 7126 in bus 5021 is not BN, then carries out actively table and search, fill, and By bus 5023, corresponding BN is delivered to selector 5025 export as it, storage to pointer memory In 5009.Similarly, if that read from track table 7126 in bus 7009 is the BN of target trajectory point, Then selector 7017 selects the input from bus 7009, and this BN is directly stored in pointer memory 5011 In;If that read from track table 7126 in bus 7009 is not BN, then carries out actively table and search, fill, And corresponding BN is delivered to selector 7017 as its output, storage to pointer memory by bus 7011 In 5011.Think that the BN in bus 5027 and 7007 is to address track table 7126 as such, it is possible to simplify The BN of the target trajectory point obtained.Fig. 6 shows the embodiment that interrelated logic runs.

Below in conjunction with Fig. 1 and Fig. 6 (a), specific operation process is described.In Fig. 6 (a), row 6005 are corresponding Pointer register 5005, the corresponding pointer register 5007 of row 6007, the corresponding pointer register 5009 of row 6009, The corresponding pointer register 5011 of row 6011, and each behavior once update after result.

Assuming that program just brings into operation, branch point A is first branch point after program brings into operation.Point BN corresponding for fulcrum A is written in pointer register 5005.Now, four pointer registers only refer to The value of pin depositor 5005 is effective, as shown in the row 6013 in Fig. 6 (a).Now, MUX 7013 The output of select finger depositor 5005 is as the value of pointer 5029, and by the BNY portion in pointer 5029 Be distributed to increase one from increasing device 5003, it is possible to after mobile BNY to branch point A in sequential instructions first Individual branch instruction tracing point (branch point B), and the BN value after updating writes in pointer register 5005. Meanwhile, the value of pointer 5029 is delivered to track table 7126 and addresses, read branch point A target trajectory point (minute Fulcrum C), and the BN value after updating writes in pointer register 5009, as shown in row 6015.So, The instruction segment 1003,1005 of respective branches point B and C is all filled in high-level memorizer.

Afterwards, the output of MUX 7013 select finger depositor 5005 as the value of pointer 5029, And deliver to increase one from increasing device 5003 by the BNY part in pointer 5029, it is possible to mobile BNY is to branch First branch instruction tracing point (branch point D) in sequential instructions after some B, and the BN after updating In value write pointer register 5005.The value of pointer 5029 is delivered to track table 7126 address simultaneously, read The target trajectory point (branch point E) of branch point B, and will update after BN value write pointer register 5009 In, as shown in row 6017.So, the instruction segment 1007,1009 of respective branches point D and E is all filled In high-level memorizer.

Meanwhile, the output of MUX 7015 select finger depositor 5009 is as pointer 7017 Value, and deliver to increase one from increasing device 7005 by the BNY part in pointer 7017, it is possible to mobile BNY arrives First branch instruction tracing point (branch point F) in sequential instructions after branch point C, and will update after In BN value write pointer register 5007.The value of pointer 7017 is delivered to track table 7126 address simultaneously, Read branch point C target trajectory point (branch point G), and will update after BN value write pointer register In 5011, as shown in row 6017.So, the instruction segment 1011,1013 of respective branches point F and G all by It is filled in high-level memorizer.More than operate whether the branch's transfer that all need not by processor core is sent here is sent out Raw information controls.

After updating through above-mentioned two steps, it is complete the instruction segment to the two-layer branch started from branch point A Prefetching, many pointer addressings device 7001 stops updating, and waits that branch's transfer that processor core is sent here is the most successful The information 5031 occurred.

Branch point A is branch point (that is: the read pointer 55 in Fig. 3 that first meeting is gone to by processor core Point to branch point A), and branch's transfer information 5031 whether success occurs is sent to many pointer addressings device 7001.Successfully occur if information 5031 shows that branch shifts, then afterwards first meeting should be performed to Branch point is updated to branch point C (that is: the read pointer 55 in Fig. 3 being pointed to branch point C), by four pointers Depositor is updated to branch point L, M, N, P, and prefetches command adapted thereto section.

If information 5031 shows branch, transfer does not occur, then should afterwards first meeting be performed to divides Fulcrum is updated to branch point B (that is: the read pointer 55 in Fig. 3 being pointed to branch point B), is posted by four pointers Storage is updated to branch point H, I, J, K, and prefetches command adapted thereto section.

Illustrate as a example by branch's transfer does not occur below.Do not occur when the branch of branch point A shifts Time, branch's decision logic 5015 produces control signal so that MUX 7013 select finger depositor The output of 5005 is as the value of pointer 5029, and delivers to the BNY part in pointer 5029 from increasing device 5003 Increase one, it is possible to after mobile BNY to branch point D, first branch instruction tracing point in sequential instructions (divides Fulcrum H), and the BN value after updating writes in pointer register 5005, and the value of pointer 5029 is given Address to track table 7126, read branch point D target trajectory point (branch point I), and will update after In BN value write pointer register 5009, as shown in row 6019.So, the finger of respective branches point H and I The section of order 1015,1017 is all filled in high-level memorizer.

Meanwhile, the control signal that branch's decision logic 5015 produces makes MUX 7015 select equally Select the output value as pointer 7017 of pointer register 5009, and by the BNY part in pointer 7017 Deliver to increase one from increasing device 7005, it is possible to first in sequential instructions after mobile BNY to branch point E Branch instruction tracing point (branch point J), and the BN value after updating writes in pointer register 5007.With Time the value of pointer 7017 delivered to track table 7126 address, read the target trajectory point (branch of branch point E Point K), and the BN value after updating writes in pointer register 5011, as shown in row 6019.So, The instruction segment 1019,1021 of respective branches point J and K is all filled in high-level memorizer.Due to Completing the instruction prefetch of two-layer branch, many pointer addressings device 7001 stops updating, and waits that processor core is sent here Branch's transfer information 5031 that success occurs of branch point B.

According to technical solution of the present invention, can be with ' with leading pointer, root (root) ' represents that read pointer is corresponding Branch point and branch point corresponding to pointer register.Such as, hold if branch point A is current processor First branch point after the instruction of row, then it is assumed that branch point A is current prefetched instruction section ' root ' point Fulcrum.Branch point B and C is second layer branch point, and branch point D, E, F, G correspond to currently ' root ' The branch point that four leading pointers of branch point A point to.Fig. 6 (b)~Fig. 6 (e) gives from ' root ' Branch point produces four leading pointers movement ' root ' branch point, updates the process of leading pointer, Qi Zhongyong Circle represents ' root ' branch point, represents leading pointer with triangle.

Fig. 6 (b) is the first step of this process.Branch point A is ' root ' branch point, can be at pointer register Device (that is: leading pointer) is set up next branch point B when branch point A order performs, and branch point A Next branch point C in branch target.Now branch point A is ' root ' branch point, branch point B and C It it is current leading pointer (as shown in the row 6015 in Fig. 6 (a)).

Fig. 6 (c) is the second step of this process.Them are found respectively by leading pointer B and C in previous step Next branch point D and F when corresponding order performs, and next branch point E and G in branch target. Now branch point A is ' root ' branch point, and branch point D, E, F and G are that current leading pointer is (such as figure Shown in row 6017 in 6 (a)).So far, four leading pointers (pointer register) all have been filled with, and The signal whether not occurred by branch's transfer during Gai controls.

Fig. 6 (d) is the 3rd step of this process.In this step, processor core has sent the transfer of expression branch here The signal whether success occurs.Assume branch point A branch transfer not have generation (branch of branch point A turns Move situation about occurring also to be similar to), processor core performs immediately following the next instruction after branch point A, and this instruction First branch point afterwards is branch point B, i.e. ' root ' branch point is moved to branch point B.Due to branch Point C will not be performed to, therefore can be with cutting (prune) branch point C and subsequent branches point F and G thereof.

Found the order of they correspondences to perform respectively by two corresponding for branch point B leading pointer D and E simultaneously Time next branch point H and J, and next branch point I and K in branch target.Now branch point B is ' root ' branch point, branch point H, I, J and K are that current leading pointer is (such as the row 6019 in Fig. 6 (a) Shown in).So far, the signal whether occurred by branch's transfer controls, and ' root ' branch point moves complete and leading Pointer updates complete.

Fig. 6 (e) is the 4th step of this process.In this step, successfully send out because the branch of branch point B shifts Raw, processor core performs the Branch Target Instruction of branch point B, and first branch point after this instruction is Branch point E, i.e. ' root ' branch point are moved to branch point E.Owing to branch point D will not be performed to, Therefore can be with cutting branch point D and subsequent branches point H and I thereof.

Found the order of they correspondences to perform respectively by two corresponding for branch point E leading pointer J and K simultaneously Time next branch point Q and S, and next branch point R and T in branch target.Now branch point E Being ' root ' branch point, branch point Q, S, R and T are that current leading pointer is (such as the row in Fig. 6 (a) Shown in 6021).So far, controlled by branch's signal of whether occurring of transfer, ' root ' branch point move complete and Leading pointer updates complete.

In the present embodiment, the read pointer in ' root ' branch point respective carter table system, its first address (BNX) directional processors core required instruction memory block in high-level memorizer;Leading pointer correspondence rail Pointer register in road table system, points to the branch point that next time may prefetch its Branch Target Instruction section; And between ' root ' branch point and leading pointer one layer two branch points (such as B and C in Fig. 6 (c), Or such as D and E in Fig. 6 (d), or such as J and K in Fig. 6 (e)) corresponding Branch Target Instruction section From low level memorizer to during high-level memory pre-fetch or the most prefetched and be stored in high level In external memory.It should be noted that two-layer branch point correspondence instruction segment is all only prefetched by above example Illustrate.But, according to technical scheme disclosed above and embodiment, those skilled in the art should be able to be very Easily by increasing like, device or software, extend, be generalized to the instruction of more layers branch point correspondence Prefetching of section, and these also should belong to the scope of protection of the invention.

Fig. 7 (a) is to carry out two-layer Branch Target Instruction segment prefetching logic according to of the present invention based on track table 7000.In Fig. 7 (a), prefetch logic 7000 with in Fig. 5 to prefetch logic 5000 similar.But, Prefetch logic 7000 and add increasing device and a many pointer addressings device of selector certainly comprising lesser number 5001。

Prefetch logic 7000 and comprise compression track table 126, tracking device 170 and many pointer addressings device 5001.

Many pointer addressings device 5001 comprise from increase device 5003, pointer memory 5005,5007,5009, 5011, and MUX 5013 and branch's decision logic 5015.Pointer memory 5005,5007, 5009 are used for storing four branches instructing the second layer corresponding to beginning from being currently executing with 5011 refers to Order.

Track table 126, for selecting a pointer from four pointer memories, is sought by MUX 5013 Location.From increasing device 5003 for by four pointer memories (that is: 5005,5007,5009 and 5011) Second address (BNY) of the pointer selected increases one until next branch instruction track in same track Point.Before four pointer registers are not completely filled, by logic of propositions by these four pointer registers All fill up.The branch's transfer after branch's decision logic 5015 sent processor core here occurs signal to process, Decoding obtains the write signal for controlling four pointer memory writes respectively, and controls MUX The selection signal 5019 of 5013.

If additionally, that read from track table 126 in bus 5021 is the BN of target trajectory point, then selecting Device 5025 selects the input from bus 5021, and this BN is directly stored in pointer memory 5009 or 5011 In;If that read from track table 126 in bus 5021 is not BN, then carries out actively table and search, fill, And corresponding BN is delivered to selector 5025 as its output, storage to pointer memory by bus 5023 In 5009 or 5011.This process can simplify the BN thought in bus 5027 in the present embodiment is tracking The BN of the target trajectory point that road table 126 addressing obtains.Below in conjunction with the concrete behaviour of Fig. 1 and Fig. 6 (f) explanation Make process.

Fig. 1 starts the instruction segment on the left side from each branch point had track table 126 accordingly Track;And the instruction segment on the right does not sets up track in track table 126.Such as Fig. 6 (f) institute Showing, the binary tree in corresponding diagram 1, four row 6005,6007,6009 in form are the most corresponding with 6011 The value of four pointer registers 5005,5007,5009 and 5011, and each behavior updates each time or The content of these four pointer registers after branch operation.

Assuming that program just brings into operation, branch point A is first branch point after program brings into operation.Point BN corresponding for fulcrum A is written in pointer register 5009.Now, four pointer registers only refer to The value of pin depositor 5009 is effective, as shown in row 6023.Now, MUX 5013 select finger is posted The output of storage 5009 is as the value of pointer 5029, and delivers to the BNY part in pointer 5029 from increasing Device 5003 increases one, it is possible to first branch instruction in sequential instructions after mobile BNY to branch point A Tracing point (branch point B), and the BN value after updating writes in pointer register 5005.Meanwhile, will refer to The value of pin 5029 is delivered to track table 126 and is addressed, and reads the target trajectory point (branch point C) of branch point A, And in the BN value write pointer register 5009 after updating, as shown in row 6025.So, respective branches The instruction segment 1003,1005 of point B and C is all filled in high-level memorizer.

Afterwards, the output of MUX 5013 select finger depositor 5005 as the value of pointer 5029, And deliver to increase one from increasing device 5003 by the BNY part in pointer 5029, it is possible to mobile BNY is to branch First branch instruction tracing point (branch point D) in sequential instructions after some B, and the BN after updating In value write pointer register 5007.The value of pointer 5029 is delivered to track table 126 address simultaneously, read The target trajectory point (branch point E) of branch point B, and will update after BN value write pointer register 5011 In, as shown in row 6027.So, the instruction segment 1007,1009 of respective branches point D and E is all filled In high-level memorizer.

Then, the output of MUX 5013 select finger depositor 5009 as the value of pointer 5029, And deliver to increase one from increasing device 5003 by the BNY part in pointer 5029, it is possible to mobile BNY is to branch First branch instruction tracing point (branch point F) in sequential instructions after some C, and the BN after updating In value write pointer register 5007.The value of pointer 5029 is delivered to track table 126 address simultaneously, read The target trajectory point (branch point G) of branch point C, and will update after BN value write pointer register 5009 In, as shown in row 6027.So, the instruction segment 1011,1013 of respective branches point F and G is all stored In high-level memorizer.More than operate what whether the branch's transfer that all need not be sent here by processor core occurred Information controls.

It should be noted that two next branch points that same branch point (such as: branch point B) is corresponding (such as: Branch point D and branch point E) corresponding BN is written simultaneously corresponding pointer register;And same layer divides Fulcrum (such as: branch point B and branch point C) each self-corresponding next branch point corresponding BN write refers to accordingly The sequencing of pin depositor it is not necessary to.For example, it is possible to first by two next branches of branch point B Point (branch point D and branch point E) write respective pointer depositor, then by two of branch point C next points Fulcrum (branch point F and branch point G) write respective pointer depositor;Can also be first by the two of branch point C Next branch point individual (branch point F and branch point G) write respective pointer depositor, then by branch point B's Two next branch point (branch point D and branch point E) write respective pointer depositors.The present embodiment is given The sequencing description merely for convenience gone out, other any possible orders broadly fall into scope.

After updating through above-mentioned three steps, it is complete the instruction to the two-layer branch started from branch point A pre- Taking, many pointer addressings device 5001 stops updating, and waits that branch's transfer that processor core is sent here is the most successfully sent out Raw information 5031.

Branch point A is the branch point that first meeting is performed to, and represents the most successful generation of branch's transfer Signal 5031 is sent to many pointer addressings device 5001.Successfully occur if information 5031 shows that branch shifts, Then afterwards first branch point that meeting is performed to should be updated to branch point C, by four pointer registers more New is branch point L, M, N, P, and prefetches command adapted thereto section.

If information 5031 shows branch, transfer does not occur, then should afterwards first meeting be performed to divides Fulcrum is updated to branch point B, four pointer registers is updated to branch point H, I, J, K, and prefetches phase Answer instruction segment.

Illustrate as a example by branch's transfer does not occur below, i.e. do not send out when the branch of branch point A shifts Time raw, branch's decision logic 5015 produces control signal so that MUX 5013 select finger is deposited The output of device 5007 is as the value of pointer 5029, and delivers to the BNY part in pointer 5029 from increasing device 5003 increase one, it is possible to first branch instruction rail in sequential instructions after mobile BNY to branch point D Mark point (branch point H), and the BN value after updating writes in pointer register 5005.Again by multi-path choice The output of device 5013 select finger depositor 5007 is delivered to track table 126 as the value of pointer 5029 and is addressed, Read branch point D target trajectory point (branch point I), and will update after BN value write pointer register In 5009, as shown in row 6029.So, the instruction segment 1015,1017 of respective branches point H and I all by It is filled in high-level memorizer.

Afterwards, branch's decision logic 5015 produces control signal again so that MUX 5013 selects The output of pointer register 5011 is as the value of pointer 5029, and the BNY part in pointer 5029 is sent One is increased, it is possible to the first point in sequential instructions after mobile BNY to branch point E to from increasing device 5003 Zhi Zhiling tracing point (branch point J), and the BN value after updating writes in pointer register 5007.Again by The output of MUX 5013 select finger depositor 5011 delivers to track table 126 as the value of pointer 5029 Addressing, read branch point E target trajectory point (branch point K), and will update after BN value write pointer In depositor 5011, as shown in row 6029.So, the instruction segment 1019,1021 of respective branches point J and K All it is filled in high-level memorizer.So far the instruction prefetch of two-layer branch, many pointer addressings device are completed 5001 stop updating, branch's transfer information whether success occurs of the branch point B that wait processor core is sent here 5031。

According to technical scheme of the present invention carry out based on track table carry out two-layer Branch Target Instruction prefetch time, Prefetched that two instruction segments corresponding to ground floor branch instruction are corresponding with the two of the second layer branch instructions four Instruction segment.So, a total of six instruction segments can be prefetched.As a example by Fig. 1, if branch point A is current First branch point after the instruction that processor is carrying out, i.e. branch point A is root branch point, then pass through After two-layer Branch Target Instruction based on track table prefetches, instruction segment 1003,1005,1007,1009,1011, 1013 are all filled in caching.But, owing to above-mentioned six instruction segments might not all can be performed to, Therefore data contamination it is likely to result in.To prefetched but be not already stored in caching in and will not be performed to point Fulcrum carries out cutting can avoid this data contamination problem.Fig. 7 (b) shows that another is cut out with branch point Cut the two-layer Branch Target Instruction segment prefetching logic 7500 of function.

As shown in Fig. 7 (b), prefetch logic 7500 with in Fig. 7 (a) to prefetch logic 7000 similar, bag Containing track table 126, tracking device 170 and many pointer addressings device 5001.Go back additionally, prefetch in logic 7500 Contain trace register 7509, selector 7507, bus 7501, bus 7511 and temporary storage location 7503.

As described in embodiment before, pointer register 5005,5007,5009 and 5011 stores respectively Four leading pointer values of corresponding current ' root ' branch point.Same is that ' root ' branch point is with branch point A Example, as shown in the row 6023 in Fig. 6 (f), instruction segment 1003 have been filled into caching in and at track Table 126 establishes respective rail.

If additionally, the Branch Target Instruction section 1005 of branch point A is not already stored in caching, then this instruction segment 1005 is the most prefetched.But, prefetched instruction segment 1005 is not stored in caching, but by total Instruction segment 1005 is stored temporarily in temporary storage location 7503 by line 7501;Similarly, though instruction segment 1005 So it is scanned and generates corresponding a series of tracing point, but these tracing points are not the most stored in track table 126 Respective rail in, but be temporarily stored in temporary storage location 7503.Branch point A thus can be found to hold Branch point B and C may being performed to after row is complete.So, as shown in the row 6025 in Fig. 6 (f), Pointer register 5005 and 5009 stores the information of branch point B and C respectively, due to instruction segment 1007 Have stored in caching and instruction segment 1011 has stored in temporary storage location 7503, therefore to instruction segment The instruction segment being not already stored in caching in 1009 and 1013 carries out prefetching and pass through bus 7501 and temporarily stores In temporary storage location 7503;As such, it is possible to find branch point D, E, F, G of next layer.Such as Fig. 6 (f) In row 6027 shown in, pointer register 5005,5007,5009 and 5011 stores branch point respectively The information of F, D, G, E.So far, two-layer Branch Target Instruction corresponding to ' root ' branch point A is completed Segment prefetching.Now, the value (respective branches point G and E) of pointer register 5009 and 5011 is sent to choosing Select device 7507 and wait selection.

If branch's transfer of ' root ' branch point A does not occur, then move ' root ' branch point to branch point B And update leading pointer.Owing to instruction segment 1003,1007,1015 has stored in caching, and instruction segment 1009,1019 have stored in temporary storage location 7503, therefore after leading pointer updates, it is only necessary to prefetch Instruction segment 1017 and 1021 and be stored in temporary storage location 7503, and it is temporarily stored in temporary storage location 7503 before In instruction segment 1005,1011,1013 can be dropped.Meanwhile, selector 7507 is by branch point A The information 5031 not having branch to shift controls the value (branch point E) of select finger depositor 5011 as output And store in trace register 7509.

Afterwards, if branch's transfer of ' root ' branch point B does not occur, operate the most as stated above, at this No longer illustrate;If ' root ' branch point B there occurs that branch shifts, then under the control of trace register 7509, The instruction segment corresponding by the value (branch point E) in trace register 7509 and tracing point information are from temporary storage location 7503 in bus 7511 delivers to caching and track table 126 respectively so that instruction segment 1009 and follow-up finger thereof The section of order 1019 goes to before these instruct just already stored in caching at processor core.

On the other hand, if ' root ' branch point A there occurs that branch shifts, then move ' root ' branch point to dividing Fulcrum C also updates leading pointer.Now instruction segment 1005 and subsequent instructions section 1011 thereof can be stored In caching, and their corresponding tracing point is stored in track table 126.In this case, it is only necessary to Prefetched instruction section 1025 and 1029 and store in temporary storage location 7503, and it is temporarily stored in temporary single before Instruction segment 1009 in unit 7503 can be dropped.Operating process afterwards is similar with the most described, at this No longer illustrate.

In the present embodiment, although prefetched the instruction segment all may being performed to, and generated corresponding Tracing point, but only the instruction segment being certain to be used is stored in caching and its corresponding tracing point is stored In track table, it is to avoid because prefetching the data contamination brought.

Additionally, due to tracking device saves the information of ' root ' branch point, therefore can also perform every time When branch instruction produces branch's transinformation, ' root ' branch point from tracking device, again update many Pointer register value in pointer addressing device, determines which instruction segment is to be certain to be performed, and will be temporary In unit, this instruction segment and corresponding tracing point are respectively stored in caching and track table, thus are not using tracking In the case of depositor, it is to avoid because prefetching the data contamination brought.

According to technical scheme of the present invention, it is also possible to use the method for track table to lower level external memory (such as: L2 cache) carries out prefetching filling and addressing.As a example by L2 cache, can have independent a set of Filling/maker, actively table, track table, tracking device and branch's decision logic, and receipt source is in processor Whether what core current performed is information and branch's transfer the most successfully information of branch instruction.L2 cache Respective rail table prefetching device compared with the respective rail table prefetching device of level cache, also according to branch Point sets up track prefetched instruction section, and difference is that L2 cache only just delays to one-level when level cache lacks Offer instruction segment is provided.Fig. 8 is the embodiment of two-stage track table cache structure 8000.As shown in Figure 8, caching Structure 8000 comprises an one-level track table system 8101 and a secondary track table system 8111.One-level rail Road table system 8101 is identical with the function of the buffer structure 3000 of Fig. 3 embodiment;Secondary track table system 8111 Identical with the function of one-level track table system 8101, difference is that secondary track table system 8111 does not has Comprise processor core, and instead of level cache 8135 with L2 cache 8133.

The effect of one-level track table system 8101 is to be prefetched to by the instruction segment that possible be performed by processor core 125 In level cache 8135, during to guarantee that processor core 125 obtains instruction, this instruction has stored in one-level and delays Deposit in 8135, it is to avoid level cache lacks;And the effect of secondary track table system 8111 is by level cache That do not have in 8135 but may be performed by processor core 125 instruction segment (that is: level cache 8135 may The instruction segment prefetched from L2 cache 8133) it is prefetched in L2 cache 8133, to guarantee level cache During 8135 prefetched instruction section, this instruction segment has stored in L2 cache 8133, it is to avoid L2 cache lacks. Connect external memory storage and secondary track table system 8101 can be third level storage 8233, outside storage Medium, network storage medium or the memory device of other forms, the memory capacity ratio one of external memory storage The memory space of level and L2 cache will be big.

In place's buffer structure 8000, for one-level track table system 8011, the read pointer of one-level track table Article 1 branch instruction after the instruction that 55 directional processors cores are currently executing, by one-level read pointer 55 The tracing point content 8104 addressing reading from one-level track table 126 is current the first point performed after instructing The branch target tracing point of fulcrum positional information in one-level track table.After one-level track table system 8101 Operation as described in embodiment before, no longer illustrate at this.

Similarly, for secondary track table system 8111, read pointer 8123 is for secondary track table 8113 Addressing, selector 8121, increase a logic 8131 and the selector 49 in one-level track table system 8101, Increase logic 48 function to be similar to, make read pointer 8123 shift to two for adding one by the second address (BNY) First branch point after present instruction in level track table 8113, and read tracing point information.If this track What dot information comprised is the BN value of secondary track table 8113, and the Branch Target Instruction that this branch point is corresponding is described Section has been stored in L2 cache, it is not necessary to prefetch.

If what this tracing point information comprised is not BN value, then need to carry out two grades of active table couplings, update and refer to Make segment prefetching, scanning, padding, so that command adapted thereto section is stored in L2 cache.Similarly, The branch's transfer information 8109 whether success occurs sent by processor core 125 controls selector 8129, with And be currently the most whether the renewal that controls depositor 8119 of the information 8137 of branch instruction, so that two grades of rails The tracking device of road table system 8111 can correctly update the value of read pointer 8123.It is to say, once one-level The value of the read pointer 55 of track table system 8101 changes and has pointed in one-level track table 126 New branch point, then the read pointer 8123 of secondary track table system 8111 also can be sent here at processor core accordingly Branch's transfer whether successfully signal control lower change, point to corresponding new branch in secondary track table 8113 Point, and mate accordingly, padding, thus realize the instruction prefetch of secondary track table system.When When level cache needs to fill, the institute of level cache just that first address (BNX) of two grades of read pointers is pointed to The instruction segment wanted.

In order to enable preferably to eliminate or cover the level cache disappearance that L2 cache disappearance is caused, can use Secondary track table system prefetches the method for more level Branch Target Instruction section so that level cache may need Instruction segment can earlier be stored in L2 cache, i.e. according to from low level memorizer obtain instruction needed for Response time determines to be needed to prefetch how many layers of Branch Target Instruction section.

Such as: if one-level track table system only prefetches one layer of Branch Target Instruction, then secondary track table system can To prefetch two-layer or more layers Branch Target Instruction, by that analogy.In other words, once one-level track table system The value of the read pointer of system changes, and has pointed to a new branch point in one-level track table, then by Fig. 5 Method described in embodiment, the pointer register change to be ensued of secondary track table system, update respectively For multiple new branch points corresponding in secondary track table, and mate accordingly, padding, thus real The Branch Target Instruction of more levels of existing secondary track table system prefetches.

Typically, L2 cache capacity is more than level cache, therefore, uses of the present invention based on track During the prefetching technique of table, the number of L2 cache track table middle orbit can be more than rail in level cache track table The number in road, and in L2 cache track, the number of tracing point is also greater than the number of tracing point in level cache track Mesh.Even if having employed the compaction table technology of embodiment 4000, the number of branch's tracing point in L2 cache track Also can be more than the number of branch's tracing point in level cache track.In order to reduce L2 cache track Biao Zhong branch The number of tracing point, can compress, as shown in Figure 9 further to L2 cache track table.

Fig. 9 is the embodiment 9000 compressed L2 cache track table further.As it is shown in figure 9, two grades are delayed The size depositing instruction segment 8001 is the twice of level cache instruction segment, i.e. wraps in L2 cache instruction segment 8001 The number of instructions contained is the twice of the number of instructions comprised in level cache instruction segment, and dotted line represents instruction segment 8001 are divided into two first-level instruction section boundary lines.

In the present embodiment, ' X ' and ' O ' of the branch instruction in instruction segment 8001 distinguishes.‘O’ Represent (inside) branch instruction in branch target branch instruction in instruction segment 8001, i.e. section, ' X ' Represent outer (outside) branch instruction of the branch target not branch instruction in instruction segment 8001, i.e. section.Can With by the positional information of branch instruction itself and its branch transfer offset information is carried out simple computation or ratio Relatively, it determines branch instruction in this outer branch instruction of branch instruction section of being or section.I.e. by judging branch target In the range of whether tracing point is positioned at the start-stop at source branch instruction (branch point) place instruction segment (track), really Branch instruction in this outer branch instruction of source branch instruction (branch point) section of being fixed or section.For example, it is possible to will The section bias internal amount that branch instruction is positioned in its place instruction segment shifts side-play amount plus its branch, if result is big In being equal to ' 0 ' and being less than number of instructions contained by instruction segment, then can determine that branch instruction in this branch instruction section of being, The otherwise outer branch instruction of the section of being.Branch point in the branch point i.e. section that in section, branch instruction is corresponding, Duan Wai branch refers to The outer branch point of branch point i.e. section that order is corresponding.

According to technical scheme of the present invention, in secondary track (track in secondary track table), permissible Only corresponding for storage X branch instruction information, and do not store the branch instruction information that ' O ' is corresponding.In other words Saying, the information of the only outer branch point of the section of storing in secondary track, in the section of eliminating, the information of branch point, enters One step reduces the length of every secondary track.

As it is shown in figure 9, in instruction segment 8001, first corresponding Tiao Duannei branch of branch point 8011 refers to Order, represents with ' O ';Second branch point 8013 outer branch instruction of a corresponding section, represents with X;The Branch instruction in three corresponding sections of branch point 8015, represents with ' O ';Other branch points are by that analogy.

Assume L2 cache instruction segment 8001 dotted line first half (that is: level cache instruction segment 8007) Be stored in level cache, corresponding one-level track as shown in track 8005, ' X ' therein and ' O ' Corresponding with in instruction segment 8001 ' X ' and the number of ' O ' and relative position are the most consistent.One-level track table is adopted With the compress technique described in Fig. 4 compress in one-level track 8009, i.e. one-level track 8005 each effectively The all corresponding branch instruction of tracing point, only branch point (' X ' type and ' O ' type) are stored in one-level rail In road 8009.Owing to one-level track containing the whole branch points in level cache instruction segment 8007, because of Before this, method described in embodiment can realize instruction prefetch based on track table.

Different from one-level track 8009, the storage of 8003, one-level track is because in secondary track table in the section of lacking Branch point, and in section the branch target of branch point likely can the outer branch point of the section of crossing, so needing one-level rail The help of road table realizes the read pointer transfer of Duan Nei branch.Specifically, secondary track table needs pointer The information (representing by PC or one-level BN) of movement, and distinguish the branch instruction that processor core currently performs In the section of being or the zone bit information of Duan Wai branch, could correctly move read pointer.

Figure 10 (a) gives the embodiment of two stages of compression track table based on caching system 10000.Such as figure Shown in 10 (a), the latter half device (one-level track table system 9001) achieves and Fig. 3 embodiment 3000 Identical function, and one-level track table 126 have employed the gauge pressure shrinking structure described in embodiment 4000, and each List item (branch point) is except comprising instruction type the 57, first address (BNX) the 58, second address (BNY) Beyond 59, also comprising an extra flag bit 9033, this flag bit represents this branch point pair with ' 1 ' Answer is ' X ' type branch point (that is: the outer branch instruction of section), with ' 0 ' expression ' O ' type branch point (i.e.: Branch instruction in section).

By this flag bit 9033 while read pointer 55 reads tracing point information to the addressing of one-level track table 126 Value deliver to top half device 10000 (secondary track table system 9011) is used for controlling by bus 9003 Selector 9017 processed.Processor core send current whether perform be information and branch's transfer of branch instruction The most successfully information is not only used for controlling the corresponding registers in one-level track table system 9001 and selection Device, is also delivered in secondary track table system 9011 for respectively by bus 9007 and bus 9009 respectively Control depositor 9019 and selector 9021.In the case of receiving these information, secondary track table system System 9011 can realize L2 cache instruction prefetch based on track table.

Assume that the instruction that processor core is currently executing is not branch instruction.According to the most described technical scheme, Article 1 branch after the instruction that the read pointer 55 directional processors core of one-level track table is currently executing refers to Order, this branch instruction can corresponding ' X ' type branch point, it is also possible to corresponding ' O ' type branch point.Due to two Only comprising ' X ' type branch instruction information in level track, therefore, the read pointer 9023 of secondary track table points to Article 1 ' X ' type branch point after the instruction that processor core is currently executing.So, one-level read to refer to Pin 55 addresses the tracing point content 9004 (comprising zone bit information 9003) of reading from one-level track table 126 The branch target tracing point positional information of first branch point after instructing for current execution, and referred to by two grades of readings Pin 9023 addresses the tracing point content 9025 of reading for being somebody's turn to do ' X ' type branch point from secondary track table 9013 Branch target tracing point positional information.

If current first branch point performed after instructing is ' O ' type branch point, i.e. this branch point and its branch Instruction corresponding to target is positioned in same L2 cache memory block, now delivers to selector 9017 for controlling The zone bit information value selected is ' 0 ', and selector 9017 selection derives from the output of mapping block 9015 Secondary track side-play amount 9027.Mapping block 9015 receipt source diverts the aim skew in the branch of processor core Measure 9005, and the side-play amount 9005 that branch diverted the aim is converted to secondary track side-play amount 9027.

Such as, it is used as L2 cache memory block when a high position for instruction address (branch's branch target address) During block address, it is possible to the low level of instruction address (branch divert the aim side-play amount 9005) to L2 cache Every instruction in memory block is addressed, uses technical scheme described in Fig. 4 embodiment, can branch be turned Move target offset amount 9005 to be converted into respective branches in respective carter table (secondary track table) and divert the aim track The secondary track side-play amount 9027 of first ' X ' type branch point after Dian.

Now, first branch that one-level track table read pointer 55 updates after subsequently pointing to this Branch Target Instruction refers to Order, and secondary track table read pointer 9023 points to first ' X ' type branch point after this Branch Target Instruction. Due to this Branch Target Instruction with cause the branch instruction itself entering this branch target to be in same two grades depositing In storage block, therefore for secondary track table system 9011, it is not necessary to carry out the operation of instruction prefetch.

If current first branch point performed after instructing is ' X ' type branch point, i.e. this branch point and its branch Instruction corresponding to target is positioned in different L2 cache memory blocks, now delivers to selector 9017 for controlling The zone bit information value selected is ' 1 ', and selector 9017 selects to derive from two grades of selector 9021 output Track table BN value 9029.Selector 9021, increase a logic 9031 with in one-level track table system 9001 Selector 49, increase logic 48 function and be similar to, make read pointer for the second address (BNY) is added one 9023 shift to the next branch point in current secondary track, or select the tracing point read in secondary track table Content makes read pointer 9023 point to the target trajectory point of ' X ' type branch point, and carries out corresponding two grades of masters Dynamic table (not showing in Figure 10 (a)) mates, updates and the pre-extract operation of possible instruction segment, so that phase Instruction segment is answered to be stored in L2 cache.Similarly, by processor core send current whether be branch instruction Information 9007 control the renewal of depositor 9019, so that secondary track table system 9011 can realize this Instruction prefetching techniques based on track table described in invention.

According to technical solution of the present invention, find to need prefetched instruction when one-level track table updates, can be to two grades Track table system sends the request obtaining instruction segment.If this instruction segment is the finger that diverts the aim of ' X ' type branch point The section of order, then according to preceding method, this target instruction target word segment prefetching to two grades is delayed by secondary track table system In depositing, and specify the BN value of a secondary track accordingly.Therefore can read from secondary track table 9013 Go out the first ground of the target trajectory point of the tracing point (being i.e. somebody's turn to do ' X ' type branch point) pointed to by read pointer 9023 Location (BNX) value, can find corresponding memory block with the first address in L2 cache;Owing to two grades are delayed Deposit memory block and be more than level cache memory block, therefore while L2 cache memory block is addressed, by The side-play amount 9005 that processor core is sent here by mapping block 9015 maps, from this L2 cache memory block In select the instruction segment needed for one-level track table system, and this instruction segment delivered to one-level track table system carry out Padding.

Such as, if memory block size is the twice of memory block size in level cache in L2 cache, then For instruction address side-play amount that L2 cache memory block is addressed than the finger addressed for level cache memory block Making many one of address offset amount, the instruction segment needed for can determining one-level track table system with this is positioned at Height half part or lower half in corresponding L2 cache memory block are divided.Such as: when this value is ' 0 ' Time, the lower half that the instruction segment needed for one-level track table system is positioned in corresponding L2 cache memory block is divided;When When this value is ' 1 ', the instruction segment needed for one-level track table system is positioned at corresponding L2 cache memory block In high half part.

If one-level track table system is when secondary track table system sends the request obtaining instruction segment, required finger The section of order is the branch target instruction section of ' O ' type branch point, then according to preceding method, should ' O ' type branch Point and its target instruction target word section are positioned in same L2 cache memory block, therefore the first address of read pointer 9023 (BNX) constant, L2 cache finds corresponding memory block, uses above-mentioned same procedure simultaneously, by The side-play amount 9005 that processor core 125 is sent here by mapping block 9015 maps, and deposits from this L2 cache Storage block selects the instruction segment needed for one-level track table system, and this instruction segment is delivered to one-level track table system It is filled with operation.

Therefore, branch point is further discriminated between as ' X ' type and ' O ' type by secondary track table system 9011, and Secondary track only stores ' X ' type branch point, can effectively reduce the capacity of secondary track table.Additionally, The target instruction target word corresponding due to ' O ' type branch point has stored in L2 cache, and ' X ' type branch point Corresponding target instruction target word may be not already stored in L2 cache, therefore uses this improvement technology can also make two Level track table read pointer 9023 crosses ' O ' type branch point, moves to the next one ' X ' type branch quickly Point, earlier prefetches the instruction that may be used, needed for preferably eliminating or cover access outer layer memorizer The response time waited.

Although it should be noted that above-mentioned example only contains two-stage caching system based on track table.But, According to technical scheme disclosed above and embodiment, those skilled in the art should be able to use same easily Method realizes the caching system based on track table of more stages, and these also should belong to the scope of protection of the invention.

Furthermore, the compression method in secondary track table can also be used for one-level track table system.In this feelings Under condition, in one-level track, the branch point of storage is divided into ' X ' type and ' O ' type, and by two tracking devices Being addressed one-level track table, the read pointer that one of them tracking device is sent is directed only to ' X ' type branch point, The read pointer that another tracking device is sent points to remaining all branch point (i.e. ' X ' type and ' O ' type branch point), To update the value of two tracking device read pointers according to branch point implementation status, as shown in Figure 10 (b).

In Figure 10 (b), caching system 10500 includes foregoing various similar structure.Except for the difference that, Caching system 10500 also includes an one stage of compression track table 10505, and the first tracking device 10501 and second follows Mark device 10503.

Second tracking device 10503 is similar, according to processor core 125 with the tracking logic in one-level track table system The branch's transfer implementation status 9009 sent here updates read pointer 55, simultaneously by current branch vertex type (flag bit 9033) the first tracking device 10501 is delivered to by bus 9003.

First tracking device 10501 is similar with the tracking logic in secondary track table system, by read pointer 9023 Addressing track table 10505, the tracing point content reading ' X ' type branch point delivers to choosing by bus 9025 Select device 9021, increase simultaneously a logic 9031 read pointer 9023 is worth increasing one (i.e. should ' X ' type branch point it After first ' X ' type branch point) and will increase after value deliver to selector 9021.Selector 9021 Branch's transfer implementation status 9009 method as described in embodiment before sent here according to processor core 125 is selected Select.

According to technical scheme described in embodiment before, if the flag bit 9033 that tracking device 10503 is sent here represents work as The transfer of transfer Shi Duanwai branch of front branch, then the read pointer updating the first tracking device 10501 points to ' X ' type of being somebody's turn to do The target trajectory point (or first ' X ' type branch point after target trajectory point) of branch point.Specifically, As described in embodiment before, if the target trajectory point being somebody's turn to do ' X ' type branch point is not ' X ' type branch point, then Selector 9017 selects the side-play amount 9005 sent here according to processor core 125 to turn through mapping block 9015 mapping Turn to the positional information of first ' X ' type branch point after this target trajectory point to update read pointer 9023 Point to ' X ' type branch point.

So, tracking device 10501 is only directed to ' X ' type branch point all the time, and the second tracking device 10503 is permissible Point to two kinds of branch point, can move quickly in the case of ensureing correct renewal tracking device read pointer Dynamic also prefetched instruction section, preferably eliminates or covers the response time obtaining instruction from low level memorizer.

Figure 11 gives the caching technology based on track table embodiment under polycaryon processor environment 11000. In the present embodiment, illustrate with dual-core architecture, for the polycaryon processor environment of more processors core, Instruction prefetching techniques based on track table can also be realized by similar method.

In polycaryon processor environment 11000, one-level track table system 11001 and 11003 represents first respectively Process core and corresponding track table system and second processes core and corresponding track table system.Polycaryon processor ring Border 11000 includes a secondary track table system 11005 equally.For ease of describing, it is assumed that secondary track table system System 11005 does not use the compress technique of secondary track table system 9011 in Figure 10 (a), or its compression Technology is identical with the compress technique of one-level track table system.

In secondary track table system 11005, pointer register 11007, selector 11009 and increasing one logic 11011 with the depositor in one-level track table system 11001 cooperating, their function and Figure 10 (a) 9019, selector 9021 is identical with increasing a logic 9031, is formed corresponding to one-level track table system 11001 Two grades of tracking devices, it is achieved the friendship between one-level track table system 11001 and secondary track table system 11005 Mutually;Pointer register 11013, selector 11015 and increasing one logic 11017 and one-level track table system 11003 Cooperating, their function and the depositor 9019 in Figure 10 (a), selector 9021 and increasing one logic 9031 is identical, forms two grades of tracking devices corresponding to one-level track table system 11003, it is achieved one-level track table Mutual between system 11003 and secondary track table system 11005.

Compared with Fig. 8 embodiment, processor environment 11000 is by a shared secondary track table system 11005 There is provided for two one-level track table systems 11001,11003 and prefetch required instruction.In the present embodiment, if The value of the read pointer in any one one-level track table system 11001 and 1003 there occurs change, then two grades The pointer register change to be ensued of track table system 11005, is updated in secondary track table corresponding New branch point, and mate accordingly, padding;If two one-level track table system 11001 Hes Read pointer in 1003 all there occurs change, then arbitrated by arbitrated logic 11019, to determine first to which The value of individual pointer register is updated and carries out corresponding instruction prefetch.

Specifically, branch instruction is being performed with the processor core in current only one of which one-level track table system As a example by, it is assumed that this one-level track table system is that (this one-level track table system is one to one-level track table system 11001 Processing method during level track table system 11003 is also similar to), the read pointer root of one-level track table system 11001 (order performs instruction i.e.: afterwards to point to next branch point according to this branch transfer information whether success occurs In first branch point, or Branch Target Instruction in first branch point).

Now, ' BRANCH0 ' signal of arbitrated logic 11019 is delivered to from one-level track table system 11001 11023 effectively, and deliver to ' BRANCH1 ' of arbitrated logic 11019 from one-level track table system 11003 Signal 11025 is invalid, illustrates only have the read pointer in one-level track table system 11001 to there occurs renewal.Cause This arbitrated logic 11019 controls the output of selector 11021 select finger depositor 11007 as secondary track The BN value of the read pointer 11027 of table system 11005, by the method that Fig. 8 embodiment is similar, by increasing a logic The 11011 BN values calculating first branch point sequentially performed after current read pointer 11027 indication branch point, And from secondary track table 11031, read the tracing point content comprising this branch point branch target position BN value, Chosen device 11009 is according to the branch's transfer the most successfully information sent here from one-level track table system 11001 (' TAKEN0 ') 11033 selects, and writes back to, in pointer register 11007, complete pointer register 11007 Renewal.In the process, the Branch Target Instruction ground pointed by the pointer register 11007 after updating Location is delivered to two grades of active tables 11035 and is carried out address coupling, obtains or produce new BN value, and this branch's mesh The operation that mark instruction segment prefetches is identical with described in embodiment before, no longer illustrates at this.

If when the processor core in the first two one-level track table system is all performing branch instruction, then from one-level Track table system 11001 delivers to ' BRANCH0 ' signal 11023 of arbitrated logic 11019 and from one-level rail ' BRANCH1 ' signal 11025 that road table system 11003 delivers to arbitrated logic 11019 is all effective, by secondary Cut out logic 11019 determines first process which one-level track table system.

Such as, arbitrated logic 11019 uses fixing priority (such as one-level track table system 11,001 1 Directly have priority), the pointer register that priority treatment is relevant to one-level track table system 11001 every time is more New and other associative operations;The priority value that the two one-level track table system is variable can also be given, when two When the read pointer of individual one-level track table system updates, according to priority value at that time simultaneously, first process with preferential The renewal of the pointer register that the one-level track table system that level is high is correlated with and other associative operations.

Figure 12 is another embodiment 12000 being compressed secondary track table content.The present embodiment and figure 1 embodiment is similar to, and each branch point left hand path is the subsequent instructions that branch shifts when not occurring, right hand path Subsequent instructions (Branch Target Instruction) when occurring for branch's transfer.Difference is to have marked each branch The type of point, is wherein labeled as the outer branch point of the section of being of ' X ', is labeled as branch point in the section of being of ' O '.

As a example by first branch point after the instruction that branch point A12001 is current processor execution, if the most pre- Take one layer of instruction segment corresponding to branch target, then the read pointer in secondary track table system points to secondary track table In branch point A12001, and guarantee the instruction segment after branch point A12001 and the mesh of branch point A12001 Mark instruction segment is stored in L2 cache.Afterwards, if branch point A12001 does not occur branch to shift, then Read pointer points to next section outer branch point, i.e. branch point D, if branch point A12001 there occurs that branch shifts, The outer branch point C of next section during then read pointer points to branch point A12001 Branch Target Instruction.

If prefetching the instruction segment that two-layer branch target is corresponding, and not to the branch point section of carrying out in secondary track table Interior branch and the differentiation of Duan Wai branch, then four pointer registers in secondary track table system save respectively The positional information of branch point D, E, F, G in secondary track table.But, if in secondary track table Branch point carry out Duan Nei branch and the differentiation of Duan Wai branch, only preserve in secondary track table system outside section point Fulcrum, then four pointer registers in secondary track table system save the branch in secondary track table respectively The positional information of some D, J, F, N, can preserve more level number.Therefore, differentiation branch pattern is used Method carries out track gauge pressure contracting, can earlier prefetch deeper level Branch Target Instruction section, preferably eliminate or Cover and obtain the response time needed for instruction from lower level external memory.

Claims (14)

1. prefetching the method that multi-level Branch Target Instruction performs for processor core, described processor core is even Connect a first memory comprising executable instruction and faster second storage than first memory speed Device;Align the instruction by being filled into second memory from first memory to examine, thus extract at least Command information including branch information;
Command information according to extracting sets up plurality of tracks;It is characterized in that:
With the Article 1 branch instruction after the instruction that processor core is carrying out as root branch point as starting point, root The leading pointer address corresponding at least two-layer branch is produced according to the command information in described track.
Method the most according to claim 1, it is characterised in that:
Odd number or a plurality of branch point address to each branching level obtain the unsuccessful neck of branch with increment First pointer address;
With described each branch point address, track addressing is obtained the successful leading pointer address of branch.
Method the most according to claim 2, it is characterised in that:
Only store the leading pointer of maximum branch level.
Method the most according to claim 2, described processor core performs branch instruction and produces branch's judgement, It is characterized in that:
Branch described in beta pruning judges unselected leading pointer address;
Using described branch judge select ground floor branch point as new root branch point, according in described track Command information therefrom produce the leading pointer address corresponding at least two-layer branch.
Method the most according to claim 1, it is characterised in that farther include:
It is stored in described second deposits with the instruction segment that described leading pointer address prefetches correspondence from described first memory Reservoir so that processor nuclear energy obtains this instruction segment from second memory and instructs.
Method the most according to claim 1, it is characterised in that farther include:
By described leading pointer address and the corresponding instruction that prefetches from first memory with described leading pointer address Section is stored in a temporary storage location.
Method the most according to claim 6, it is characterised in that farther include:
Judge to select instruction segment from described temporary list according to the branch performing described branch point branch instruction gained Unit stores in second memory so that processor nuclear energy obtains this instruction segment from second memory and instructs; The instruction address corresponding with described instruction segment is selected to store in track.
8. prefetching the system that multi-level Branch Target Instruction performs for processor core, described processor core is even Connect a first memory comprising executable instruction and faster second storage than first memory speed Device;Align the instruction by being filled into second memory from first memory to examine, thus extract at least Command information including branch information;
Command information according to extracting sets up plurality of tracks;It is characterized in that:
With the Article 1 branch instruction after the instruction that processor core is carrying out as root branch point as starting point, root The leading pointer address corresponding at least two-layer branch is produced according to the command information in described track.
System the most according to claim 8, it is characterised in that:
Odd number or a plurality of branch point address to each branching level obtain the unsuccessful neck of branch with increment First pointer address;
With described each branch point address, track addressing is obtained the successful leading pointer address of branch.
System the most according to claim 9, it is characterised in that:
Only store the leading pointer of maximum branch level.
11. systems according to claim 9, described processor core performs branch instruction and produces branch and judge, It is characterized in that:
Branch described in beta pruning judges unselected leading pointer address;
Using described branch judge select ground floor branch point as new root branch point, according in described track Command information therefrom produce the leading pointer address corresponding at least two-layer branch.
12. systems according to claim 8, it is characterised in that farther include:
It is stored in described second deposits with the instruction segment that described leading pointer address prefetches correspondence from described first memory Reservoir so that processor nuclear energy obtains this instruction segment from second memory and instructs.
13. systems according to claim 8, it is characterised in that farther include:
By described leading pointer address and the corresponding instruction that prefetches from first memory with described leading pointer address Section is stored in a temporary storage location.
14. according to system described in claim 13, it is characterised in that farther include:
Judge to select instruction segment from described temporary list according to the branch performing described branch point branch instruction gained Unit stores in second memory so that processor nuclear energy obtains this instruction segment from second memory and instructs; The instruction address corresponding with described instruction segment is selected to store in track.
CN201210450909.8A 2011-11-18 2012-11-09 The caching method of a kind of low miss rate, low disappearance punishment and device CN103176914B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201110376514 2011-11-18
CN2011103765143 2011-11-18
CN201110376514.3 2011-11-18
CN201210450909.8A CN103176914B (en) 2011-11-18 2012-11-09 The caching method of a kind of low miss rate, low disappearance punishment and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210450909.8A CN103176914B (en) 2011-11-18 2012-11-09 The caching method of a kind of low miss rate, low disappearance punishment and device

Publications (2)

Publication Number Publication Date
CN103176914A CN103176914A (en) 2013-06-26
CN103176914B true CN103176914B (en) 2016-12-21

Family

ID=48428981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210450909.8A CN103176914B (en) 2011-11-18 2012-11-09 The caching method of a kind of low miss rate, low disappearance punishment and device

Country Status (4)

Country Link
US (1) US9569219B2 (en)
EP (1) EP2987085A4 (en)
CN (1) CN103176914B (en)
WO (1) WO2013071868A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3037957A4 (en) * 2013-08-19 2017-05-17 Shanghai Xinhao Microelectronics Co. Ltd. Buffering system and method based on instruction cache
CN104156181B (en) * 2014-08-18 2017-02-15 上海众恒信息产业股份有限公司 Virtual resource cross access and security isolation method
CN104536911B (en) * 2014-12-31 2018-01-02 华为技术有限公司 The cache memory and its processing method that a kind of multichannel group is connected
US10467141B1 (en) 2018-06-18 2019-11-05 International Business Machines Corporation Process data caching through iterative feedback
US10642742B2 (en) * 2018-08-14 2020-05-05 Texas Instruments Incorporated Prefetch management in a hierarchical cache system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1484157A (en) * 2002-09-20 2004-03-24 联发科技股份有限公司 Embedding system and instruction prefetching device and method thereof
CN101187860A (en) * 2006-11-21 2008-05-28 国际商业机器公司 Apparatus and method for instruction cache trace formation
CN102110058A (en) * 2009-12-25 2011-06-29 上海芯豪微电子有限公司 Low-deficiency rate and low-deficiency punishment caching method and device
US7975108B1 (en) * 2004-03-25 2011-07-05 Brian Holscher Request tracking data prefetcher apparatus

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5353419A (en) * 1992-01-09 1994-10-04 Trustees Of The University Of Pennsylvania Memory-side driven anticipatory instruction transfer interface with processor-side instruction selection
US7272703B2 (en) * 1997-08-01 2007-09-18 Micron Technology, Inc. Program controlled embedded-DRAM-DSP architecture and methods
US6965982B2 (en) * 2001-06-29 2005-11-15 International Business Machines Corporation Multithreaded processor efficiency by pre-fetching instructions for a scheduled thread
US7421567B2 (en) * 2004-12-17 2008-09-02 International Business Machines Corporation Using a modified value GPR to enhance lookahead prefetch
US7657729B2 (en) * 2006-07-13 2010-02-02 International Business Machines Corporation Efficient multiple-table reference prediction mechanism
CN102117198B (en) * 2009-12-31 2015-07-15 上海芯豪微电子有限公司 Branch processing method
US20120084537A1 (en) * 2010-09-30 2012-04-05 International Business Machines Corporation System and method for execution based filtering of instructions of a processor to manage dynamic code optimization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1484157A (en) * 2002-09-20 2004-03-24 联发科技股份有限公司 Embedding system and instruction prefetching device and method thereof
US7975108B1 (en) * 2004-03-25 2011-07-05 Brian Holscher Request tracking data prefetcher apparatus
CN101187860A (en) * 2006-11-21 2008-05-28 国际商业机器公司 Apparatus and method for instruction cache trace formation
CN102110058A (en) * 2009-12-25 2011-06-29 上海芯豪微电子有限公司 Low-deficiency rate and low-deficiency punishment caching method and device

Also Published As

Publication number Publication date
CN103176914A (en) 2013-06-26
EP2987085A4 (en) 2017-02-15
US20150193236A1 (en) 2015-07-09
US9569219B2 (en) 2017-02-14
WO2013071868A1 (en) 2013-05-23
EP2987085A1 (en) 2016-02-24

Similar Documents

Publication Publication Date Title
US10042643B2 (en) Guest instruction to native instruction range based mapping using a conversion look aside buffer of a processor
US10042573B2 (en) High speed memory systems and methods for designing hierarchical memory systems
US20170115991A1 (en) Unified shadow register file and pipeline architecture supporting speculative architectural states
US20160283239A1 (en) Guest instruction block with near branching and far branching sequence construction to native instruction block
US20160321077A1 (en) Guest to native block address mappings and management of native code storage
US8825958B2 (en) High-performance cache system and method
CN104520825B (en) For keeping the multi-core using release consistency memory order models to calculate the method and device of cache coherency
TW591530B (en) Hybrid branch prediction device with two levels of branch prediction cache
US9639371B2 (en) Solution to divergent branches in a SIMD core using hardware pointers
JP5417879B2 (en) Cache device
CN100444135C (en) Method and processor for transient cache storage
FI80532B (en) Centralenhet foer databehandlingssystem.
EP1150213B1 (en) Data processing system and method
US6665774B2 (en) Vector and scalar data cache for a vector multiprocessor
TWI506434B (en) Prefetcher,method of prefetch data,computer program product and microprocessor
TW501011B (en) Data processing circuit with cache memory
US5381533A (en) Dynamic flow instruction cache memory organized around trace segments independent of virtual address line
DE69432314T2 (en) Cache storage with split level
TWI441021B (en) Apparatus and method of simulating a single multiport memorie using a plurality of lower port count memories
US9207960B2 (en) Multilevel conversion table cache for translating guest instructions to native instructions
EP0107449B1 (en) Computer with multiple operating systems
US7047362B2 (en) Cache system and method for controlling the cache system comprising direct-mapped cache and fully-associative buffer
JP4195006B2 (en) Instruction cache way prediction for jump targets
EP1624369B1 (en) Apparatus for predicting multiple branch target addresses
US6339822B1 (en) Using padded instructions in a block-oriented cache

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 201203 501, No. 14, Lane 328, Yuqing Road, Pudong New Area, Shanghai

Patentee after: SHANGHAI XINHAO MICROELECTRONICS Co.,Ltd.

Address before: 200092, B, block 1398, Siping Road, Shanghai, Yangpu District 1202

Patentee before: SHANGHAI XINHAO MICROELECTRONICS Co.,Ltd.