WO2015024532A9 - Système et procédé destinés à la mise en cache d'instruction de haute performance - Google Patents

Système et procédé destinés à la mise en cache d'instruction de haute performance Download PDF

Info

Publication number
WO2015024532A9
WO2015024532A9 PCT/CN2014/085063 CN2014085063W WO2015024532A9 WO 2015024532 A9 WO2015024532 A9 WO 2015024532A9 CN 2014085063 W CN2014085063 W CN 2014085063W WO 2015024532 A9 WO2015024532 A9 WO 2015024532A9
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
block
address
memory
branch
Prior art date
Application number
PCT/CN2014/085063
Other languages
English (en)
Chinese (zh)
Other versions
WO2015024532A1 (fr
Inventor
林正浩
Original Assignee
上海芯豪微电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海芯豪微电子有限公司 filed Critical 上海芯豪微电子有限公司
Priority to US14/913,837 priority Critical patent/US20160217079A1/en
Publication of WO2015024532A1 publication Critical patent/WO2015024532A1/fr
Publication of WO2015024532A9 publication Critical patent/WO2015024532A9/fr
Priority to US15/722,814 priority patent/US10275358B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/452Instruction code

Definitions

  • the invention relates to the field of computers, communications and integrated circuits.
  • the role of the cache is to copy part of the lower-level memory in it, so that the content can be quickly accessed by higher-level memory or processor core to ensure the continuous operation of the pipeline.
  • the addressing of the current cache is based on the following method: the index in the address tag is used to address the tag in the tag memory to match the tag segment in the address; the index segment in the address is used to address the read buffer together with the segment in the block.
  • the content read from the cache is valid, called a cache hit. Otherwise, if the tag read from the tag memory is not the same as the tag segment in the address, it is called a cache miss, and the content read from the cache is invalid.
  • the above operations are performed in parallel for each way group to detect which way group cache hits. The read content corresponding to the hit path group is valid content. If all the way groups are missing, all readings are invalid. After the cache is missing, the cache control logic populates the contents of the low-level storage medium into the cache.
  • cache misses can be divided into three categories: forced misses, missing conflicts, and missing capacity. In the prior art, forced deletion is inevitable except for a small portion of prefetch success.
  • Modern cache systems are typically composed of multi-level caches connected by multiplexes.
  • New cache structures such as victim cache, trace cache, and prefetch, are based on the basic cache structure described above and improve the above structure.
  • the current architecture especially the lack of multiple caches, has become the most serious bottleneck restricting the performance of modern processors.
  • the method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.
  • the present invention provides a high performance instruction cache method, characterized in that the processor core is connected to a first memory containing executable instructions and a second memory faster than the first memory; the method comprises: Examining an instruction that is being filled from the first memory to the second memory, thereby extracting instruction information including at least branch information; and establishing a plurality of tracks according to the extracted instruction information; Encapsulating at least one or more instructions that may be executed by the processor core from the first memory to the second memory according to one or more tracks in the plurality of instruction tracks; The method further includes the second memory being constructed in a fully associative manner, the first memory being constructed in a group associated manner.
  • the track is in one-to-one correspondence with the instruction block in the second memory.
  • the target address is addressed by the primary block number to determine if the target instruction belongs to a certain instruction block of the second memory.
  • the secondary block number is written to the track table, and when the instruction in the first memory is filled into the second memory, it is changed to the primary block number.
  • the active table corresponds to the block position of the block number; at the same time, the flag bits of each block number in the active table are sequentially reset, thereby indicating with the set flag bit The block number currently referenced by the track so that it will not be replaced by the active table.
  • the present invention also provides a high performance instruction cache system, characterized in that the system comprises: a processor core, the processor core is configured to execute an instruction; a first memory, the first memory is configured to store an instruction required by the processor core; a second memory, the second memory is configured to store instructions required by the processor core, and the second memory is faster than the first memory; a scanner, configured to review an instruction being filled from the first memory to the second memory, thereby extracting instruction information including at least branch information; a track table, the track table is configured to store a plurality of tracks established according to the extracted instruction information; the system further includes: the second memory is configured in a fully associative manner; The first memory is constructed by a group association.
  • the tracks in the track table are in one-to-one correspondence with the instruction blocks in the second memory.
  • each instruction block in the second memory corresponds to a first-level block number.
  • the flag position of the block number corresponding to the active table is set; at the same time, the flag bits of each block number in the active table are sequentially reset, thereby using the flag bit that has been set. Indicates the block number currently referenced by the track table so that it will not be replaced by the active table.
  • the previous instruction block or the subsequent instruction block of the sequential address corresponding to one instruction block in the first memory is already stored in the first memory
  • the previous instruction block corresponding to the instruction block or the latter one is stored in the active table.
  • the storage location information of the instruction block in the first memory is stored in the active table.
  • the instruction When the instruction is located in a previous instruction block or a subsequent instruction block of the current instruction block in the first memory, the instruction may be directly in the first position according to the location information of the previous instruction block or the subsequent instruction block stored in the active table. The instruction is found in memory.
  • Boundary judgment is performed on the branch target instruction address; according to the judgment result, the branch target instruction located at different positions is given an address of a different format.
  • the secondary block number of the previous or next instruction block of the instruction block in which the branch instruction is located is used as the branch The secondary block number of the target instruction, with the address offset portion corresponding to the first memory in the branch target instruction address as the offset of the branch target instruction.
  • the active table content corresponding to the instruction being filled from the first memory to the second memory is stored in the micro active table; if the review finds that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, And the first-level instruction block directly uses the first-level block number read from the micro-active table as the first-level block number of the branch target instruction when the corresponding first-level block number is valid in the micro-active table; It is found that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, but when the level one instruction block is invalid in the corresponding first level block number in the micro active table, directly The block number is used as the secondary block number of the branch target instruction; if the review finds that the branch target instruction is located in the previous or next secondary instruction block of the branch instruction, and the previous or next secondary instruction block is in the micro active table When the corresponding secondary block number is valid, the secondary block number read out from the micro active table is directly used as the secondary block
  • the plurality of secondary block numbers and the corresponding contents of the block numbers in the active table are stored in the micro active table; if the branch target instruction is found by the review, the branch target instruction address is first matched in the micro active table, if If the matching is successful, the first block number or the second block number read out from the micro active table is directly used as the first block number or the second block number of the branch target instruction; if the matching is unsuccessful, the branch target is further The instruction address is sent to the active table match.
  • the entries of the active table are in one-to-one correspondence with the instruction blocks in the first memory, each entry storing the block address of the corresponding instruction block in the first memory; and the previous one of the sequential addresses corresponding to one instruction block in the first memory
  • the active table further stores storage location information of the previous instruction block or the subsequent instruction block corresponding to the instruction block in the first memory.
  • Boundary judgment is performed on the branch target instruction address; according to the judgment result, the branch target instruction located at different positions is given an address of a different format.
  • the system includes a singular or a plurality of adders; the adder is configured to add a lower bit of a branch instruction itself in a portion other than the offset corresponding to the first memory to a corresponding bit in the branch transfer distance, and determine the branch Whether the target instruction is located in the previous or next instruction block of the instruction block sequential address where the branch instruction is located in the first memory; when the branch target instruction is located in the first instruction block or the next instruction block of the current instruction block in the first memory In the middle, the instruction may be directly found in the first memory according to the location information of the previous instruction block or the subsequent instruction block stored in the active table.
  • the system further includes a micro active table; the micro active table is configured to store active table content corresponding to an instruction being filled from the first memory to the second memory; when the scanner finds that the branch target instruction is located a different one-level instruction block in the same two-level instruction block of the branch instruction, and the first-level instruction block is directly read from the micro-active table when the corresponding first-level block number in the micro-active table is valid.
  • the micro active table is configured to store active table content corresponding to an instruction being filled from the first memory to the second memory; when the scanner finds that the branch target instruction is located a different one-level instruction block in the same two-level instruction block of the branch instruction, and the first-level instruction block is directly read from the micro-active table when the corresponding first-level block number in the micro-active table is valid.
  • the block number is used as the first block number of the branch target instruction; if the review finds that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, but the level one instruction block is in the micro active table When the corresponding primary block number is invalid, the secondary block number of the branch instruction is directly used as the secondary block number of the branch target instruction; if the review finds that the branch target instruction is located in the previous or next secondary instruction of the branch instruction When the block and the previous or next secondary instruction block are valid in the corresponding secondary block number in the micro active table, directly use the secondary block number read from the micro active table as the The secondary block number of the branch target instruction.
  • the system further includes a micro active table; the micro active table is configured to store a plurality of secondary block numbers and corresponding contents of the block numbers in the active table; when the scanner detects the branch target instruction, the branch target is firstly The instruction address is matched in the micro active table. If the matching is successful, the primary block number or the secondary block number read out from the micro active table is directly used as the primary block number or the secondary block of the branch target instruction. Number; if the match is unsuccessful, the branch target instruction address is sent to the active table match.
  • the system and method of the present invention can provide a basic solution for the cache structure used by digital systems. Unlike the conventional cache system, which only populates after the cache is missing, the system and method of the present invention fills the instruction cache before the processor executes an instruction, and can fully hide the forced miss.
  • the system and method of the present invention essentially adopts a fully associative structure for a level 1 cache, and the level 2 cache uses a group-connected structure, substantially achieving an effect similar to a fully associative structure, avoiding capacity. Missing, but also improve the speed of the processor. Since the system and method of the present invention requires fewer matching operations and a lower rate of misses, power consumption is also significantly lower than conventional cache systems. Other advantages and applications of the present invention will be apparent to those skilled in the art.
  • FIG. 1 is a schematic diagram of an instruction prefetch structure constructed by using a multi-path group in a secondary cache according to the present invention.
  • 3 is an embodiment of the relationship between the primary instruction block, the secondary instruction block, and the corresponding storage unit of the present invention.
  • FIG. 4 is a specific embodiment of the secondary cache according to the present invention in a two-way group format.
  • FIG. 5 is another specific embodiment of the secondary cache of the present invention in the form of a two-way group.
  • FIG. 6 is another specific embodiment of a scanner configuration in the second level cache structure of the present invention.
  • Figure 7 shows the memory and format used in the micro-track table organized in a fully associative manner.
  • Figure 8 is an embodiment of a fully associative micro-track table.
  • Figure 4 shows a preferred embodiment of the invention.
  • a cache system including a processor core is taken as an example, but the technical solution of the present invention can also be applied to include any suitable processor (Processor).
  • the processor may be a general purpose processor (CPU), a microcontroller (MCU), a digital signal processor ( DSP), image processor (GPU), system on chip (SOC), application specific integrated circuit (ASIC), etc.
  • FIG. 1 is an instruction prefetching structure diagram 100 in which a level 2 cache of the present invention is constructed in a multi-path group format.
  • the structure 100 includes an active list 104, a scanner 108, a track table 110, and a tracker ( Tracker 114, a Level 2 Instruction Cache (L2 Cache) 106, a Level 1 Instruction Cache (L1 Cache) 112, and a Processor Core 116 ( CPU Core ).
  • L2 Cache Level 2 Instruction Cache
  • L1 Cache Level 1 Instruction Cache
  • CPU Core Processor Core
  • Instruction Address refers to the memory address of the instruction in the main memory, that is, the instruction can be found in the main memory according to the address.
  • the virtual address is assumed that the virtual address is equal to the physical address, and the method of the present invention is also applicable for the case where address mapping is required.
  • a branch instruction (Branch Instrutrion) or a branch source (Branch Source) refers to any form of instruction that causes the processor core 116 to change the Execution Flow (eg, execute an instruction out of order).
  • Branch source address ( Branch Souce Address ) can be the instruction address of the branch instruction itself;
  • branch target ( Branch Target ) refers to the target instruction that the branch transfer caused by the branch instruction is redirected, branch target address ( Branch Target Address It can refer to the address that is transferred when the branch transfer of the branch instruction succeeds, that is, the instruction address of the branch target instruction;
  • the current instruction can refer to the instruction currently being executed or acquired by the processor core;
  • the current instruction block can refer to the current positive The instruction block of the instruction executed by the processor.
  • the first level instruction cache 112 is constructed in a fully associative form, and the level one instruction cache 112 Each memory row is referred to as a level one instruction block, and the level one instruction cache 112 stores at least one level one instruction block of a continuous instruction including the current instruction.
  • Level 1 instruction cache 112 The method includes a plurality of first-level instruction blocks, each of the first-level instruction blocks includes a plurality of instructions, and each of the first-level instruction blocks stored in the first-level instruction cache 112 has a first-order block number (BNX1), and the first-order block number BNX1 It is the line number in the level one instruction cache 112 of the level one instruction block.
  • the secondary instruction cache 106 consists of two identical memories 126 and 128 In the composition, each memory constitutes a road group, and each road group has the same number of rows, that is, a two-way group form.
  • Each storage line of memories 126 and 128 is referred to as a secondary instruction block, and each secondary instruction block has a secondary block number ( BNX2), which is determined by the row number in the secondary instruction cache of the secondary instruction block and the cache path group in which it is located, that is, the index of the instruction row address (index The bit plus the cache path group bit indicating the instruction.
  • Each level two instruction block contains a plurality of level one instruction blocks.
  • the secondary block number of the present invention refers to the location of the secondary command block in the secondary instruction cache 106.
  • the secondary instruction cache 106 and the primary instruction cache 112 may comprise any suitable storage device, such as: a register ( Register ) or register file, static memory (SRAM), dynamic memory (DRAM), flash memory (flash Memory ), hard disk, solid state disk and any suitable storage device or future new form of memory.
  • Secondary instruction cache 106 Can work as a cache for the system, or as a level 1 cache when other caches exist; and can be partitioned into a plurality of memory segments called memory blocks for storing processor cores 116 Data to access, such as instructions in an Instruction Block.
  • the active table 104 contains two tag arrays 118 and 120 and two storage arrays that store the primary block number BNX1 122 and 124. Since the secondary instruction cache 106 is formed in the form of a two-way group, the active table is also constructed in the form of a two-way group.
  • a tag array and storage array in active table 104 and a secondary instruction cache Corresponding to one of the path groups 106, that is, the tag array 118, the storage array 122, and the L2 cache group 126, the tag array 120, the storage array 124, and the L2 cache group 128 Correspondence.
  • the elements that make up storage arrays 122 and 124 are called entries, and each entry is used to store the primary block number BNX1 and the valid bit ( Valid bit ) to save the relationship between the level one instruction block and the level two instruction cache. Since each secondary instruction block contains a plurality of primary instruction blocks, storage arrays 122 and 124 in active table 104 Each row contains a plurality of entries in which the row number BNX1 of the primary instruction block in the secondary instruction block 112 in the secondary instruction cache 112 is stored.
  • the scanner 108 examines the level 1 instruction cache from the level 2 instruction cache 106.
  • the first level instruction block acquires instruction type information and determines whether the instruction is a branch instruction or a non-branch instruction. If it is determined that the instruction is a branch instruction, the target address of the branch instruction is calculated.
  • the calculation method includes adding a branch transfer distance to the current instruction address by an adder to obtain a target address of the branch instruction. Then, the calculated target address of the branch instruction is sent to the active table. Match in 104.
  • each row of the track table 110 and the level one instruction cache 112 Each row corresponds to each other and is pointed by the same row pointer.
  • Each row of the track table 110 includes a plurality of track points, each of which corresponds to a level one instruction cache 112.
  • One instruction in a row that is, the number of track points per line in the track table is consistent with the number of instructions per line in the level one instruction cache.
  • a track point is an entry in the track table, which may contain information of at least one instruction, such as instruction class information, branch target address, and the like.
  • the track table address of the track point itself is related to the command address of the instruction represented by the track point ( Correspond); and the branch instruction track point contains the address of the branch target, and the address is related to the branch target instruction address.
  • the first level instruction cache 112 A plurality of consecutive track points corresponding to one block of instructions formed by a series of consecutive instructions are referred to as one track.
  • the command block is associated with the corresponding track by the same first block number (BNX1 ) instructions.
  • the total number of track points in a track can be equal to the total number of entries in a row in track table 110.
  • the track table 110 can also have other organizational forms.
  • the processor core 116 fetches instructions from the level one instruction cache 112 as needed, it is assumed that the instruction is not stored in the level one instruction cache at this time.
  • the instruction is then padded from the low level memory to the second level block number determined by the replacement algorithm (e.g., LRU) in the second level instruction cache 106 according to the instruction address (PC).
  • the replacement algorithm e.g., LRU
  • the corresponding level one instruction block in the second level cache 106 is filled into the level one instruction cache 112 by a replacement algorithm (such as LRU). ) Determine the storage line that BNX1 points to.
  • the replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm ( Random ) and other existing algorithms.
  • FIFO first in first out algorithm
  • LRU least recently used algorithm
  • Random random replacement algorithm
  • the scanner 108 Examine the instruction type in the first-level instruction block, extract the branch information of the branch instruction, and calculate the branch instruction target address.
  • the calculation method includes adding a branch transfer distance to the current instruction address by an adder to obtain a target address of the branch instruction.
  • the term 'filling' Fill )' means moving instructions from lower level memory to higher level memory.
  • the branch target instruction address and the active table that can be reviewed and calculated by the scanner 108 The instruction row address match stored in the middle determines whether the branch target instruction has been stored in the secondary instruction memory 106. Medium. First, the index points of the branch target instruction address are used to read the two tags stored in the active table, and then the two tags are compared with the calculated tag bits of the target branch instruction address.
  • the entry corresponding to the instruction in the successfully matched road group is selected, if the primary block number stored in the entry is BNX1) is valid, indicating that the target branch instruction has been stored in the level one instruction cache 112, then the offset of the first level block number BNX1 and the calculated branch target address stored in the active table ( Offset) is written into the track table together, and the write position is in the track point of the track table corresponding to the branch source address; if the first block number stored in the entry (BNX1) Invalid, indicating that the target branch instruction is not stored in the level one instruction cache 112, but only in the level two instruction cache 106, then the second level block number corresponding to the instruction is BNX2 And the calculated block offset of the branch target address and the branch target address offset are written into the track table together, and the write position is in the track point of the track table corresponding to the branch source address; if neither tag matches Successful, indicating that the instruction line where the branch target information
  • the first address and the second address may be used to represent position information of the track point (instruction) in the track table; wherein the first address indicates the block number of the track point corresponding to the track point (pointing to a track in the track table and the level one instruction cache) Corresponding one level one instruction block), the second address indicates the relative position of the track point (ie corresponding instruction) in the track (storage block) (offset, Address Offset ).
  • a set of first address and second address corresponds to a track point in the track table, that is, a corresponding track point can be found from the track table according to a set of first address and second address.
  • the track of the branch target may be determined according to the first address included in the content stored in the entry in the track table, and a specific track point of the target track is determined according to the second address.
  • the track table becomes a table representing a branch instruction with the branch source address corresponding to the track entry address and the branch target address corresponding to the entry of the entry.
  • the relationship between the next track and the next track is established in a track, and an end track point is set after each track represents the track point of the last instruction, wherein the first address of the next track (instruction block) is executed in the order of storage.
  • the first level instruction cache A plurality of instruction blocks can be stored in 112.
  • the next execution of the instruction block is also taken into the instruction read buffer for the processor core 116. Read execution.
  • the instruction address of the next instruction block can be found by the instruction address of the current instruction block plus the address length of an instruction block.
  • the address is sent to the active list 104 as described above. Matching, the obtained instruction block is filled in the instruction block of the level one instruction cache 112 indicated by the replacement algorithm.
  • the instructions in the next instruction block newly stored in the level 1 instruction cache 112 are also scanned by the scanner 108. Scanning, extracting information fills the track indicated by the first block number BNX1 as previously described.
  • the replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), a random replacement algorithm ( Random algorithm and other existing algorithms.
  • the tracker 114 is mainly composed of a selector 130, a register 132, and an incrementer 134.
  • Tracker 114 The read pointer points to the track table 110 The first branch instruction track point in the track in which the current instruction is located after the current instruction; or the end track point pointing to the track without the branch track point after the current instruction on the track.
  • Tracker 114 The read pointer is composed of a first address pointer and a second address pointer, wherein the value of the first address pointer is a first-order block number of the first-level instruction block where the current instruction is located, that is, a row pointer; and the second address pointer points to the current instruction on the track.
  • the first branch commands the track point or the end track point.
  • the first block number is provided by the tracker 114 when the processor core 116 fetches instructions from the level one instruction cache 112 as needed.
  • BNX1 is used to address the Level 1 instruction block, the processor provides the offset to fetch the corresponding instruction, and provides the BRANCH signal and the TAKEN signal to the tracker 114.
  • BRANCH The signal indicates whether the instruction is a branch instruction and the TAKEN signal is used to control the output of the selector.
  • Tracker 114 Used to indicate the first branch instruction after the current instruction, or to point to the end track point of the track if there is no branch track point after the current instruction on the track, and provide the processor core 116 with the first block number of the current instruction. BNX1 .
  • the processor core 116 directly from the level one instruction cache 112. Remove the instruction.
  • the secondary block number BNX2 Find the active table as the active table address. If the primary block number stored in the entry corresponding to the secondary block number is BNX1 If it is already valid, it indicates that the target address of another branch instruction before executing the instruction is the same as the instruction address corresponding to the second block number, and the target instruction has been taken into the first level instruction cache 112.
  • the first block number BNX1 is written into the track point, and when the instruction is executed, the processor core 116 goes directly to the first level instruction cache 112.
  • the instruction is fetched; if the primary block number BNX1 stored in the entry corresponding to the secondary block number is invalid, indicating that the target instruction is not in the primary instruction cache 112, then the primary block number is determined according to the replacement policy.
  • BNX1 the target instruction line is taken out from the second level instruction cache 106, filled into the first level instruction block corresponding to the level one instruction cache 112, and the first level block number BNX1 is written into the active table 104.
  • the processor core 116 directly fetches the instruction into the first-level instruction cache 112.
  • the tracker 114 If the branch instruction pointed to by the tracker 114 does not have a branch transfer, the tracker 114 The read pointer points to the first branch instruction track point after the branch instruction, or points to the end track point of the track if there is no branch instruction track point in the track point after the branch instruction.
  • the processor core reads the sequential instruction execution after the branch instruction.
  • the above slave instruction memory 106 is used.
  • the read branch target instruction block is stored in the instruction block specified by the buffer replacement logic in the instruction read buffer 112, and the scanner is filled in the corresponding track of the track table 110.
  • New track information that has been generated.
  • the branch target first address and the second address become new tracker address pointers, and point to track points corresponding to the branch targets in the track table.
  • the new tracker address pointer also points to the newly filled branch instruction block, making it the new current instruction block.
  • the processor core uses the instruction address from the new current instruction block ( The offset bit of PC) selects the required instruction.
  • the mobile read pointer points to the first branch instruction track point after the branch target instruction in the corresponding track of the new current instruction block, or points to the end of the track if there is no branch instruction track point in the track point after the branch target instruction Track point.
  • the tracker 114 points to the end track point in the track, the tracker 114 The read pointer is updated to the position content value in the end track point, that is, the first track point pointing to the next track, thereby pointing to the new current instruction block. After the tracker 114 The move read pointer points to the first branch instruction track point in the corresponding track of the new current block, or to the end track point of the track if the track has no branch command track point. Repeat the above process in turn, in the processor core 116 The instruction is populated into the instruction read buffer 112 before the instruction is executed, so that the processor core 116 does not need to wait while fetching the instruction, thereby improving processor performance.
  • FIG. 2 is an embodiment of the tracker read pointer movement of the present invention.
  • the tracker read pointer moves over the non-branch instruction in the track table, moves to the next branch point in the track table, and waits for the processor core 116 branch to determine the result.
  • Figure 2 Some components that are not related to the description of the present embodiment are omitted.
  • the track table 110 The instruction types stored in the instruction and the instruction information stored therein are arranged from small to large from left to right according to the instruction address, that is, when the instructions are executed in order, the access order of each instruction information and the corresponding instruction type is from left to right.
  • the instruction type of '0' in 110 means that the corresponding instruction in the track table 110 is a non-branch instruction, and the instruction type '1' indicates that the corresponding instruction is a branch instruction.
  • the track table can be read at any one time. 110 The entry representing the instruction type indicated by the second address 216 (offset, BNY) in a track indicated by the first address 214 (primary block number BNX1). The track table can also be read at any time. A plurality of entries representing all types of instructions, or even all entries, in a track indicated by the first address 214 in 110.
  • An end table entry is added to the right of the entry of the instruction with the largest instruction address in each row to store the address of the next instruction in sequence.
  • the instruction type of the end entry is always set to ' 1 '.
  • the first address of the instruction information in the end entry is the instruction block number of the next instruction, and the second address (BNY) ) Constant to zero, pointing to the first item of the instruction track.
  • the end table entry is defined to be equivalent to an unconditional branch instruction.
  • the tracker 114 mainly includes a shifter 202, a leading zero register 204, and an adder 206. , selector 208 and register 210.
  • the shifter 202 shifts the plurality of instruction types 218 representing the plurality of instructions read from the track table 110 to the left, and the number of bits of movement is changed by the register.
  • the second address pointer of the 210 output is determined by 216.
  • the leftmost bit of the shifted instruction type 224 output by the shifter 202 is the step bit (STEP Bit) ).
  • the signal of the step bit and the BRANCH signal from the processor core together determine the update of the register 210.
  • the selector 208 is controlled by the control signal TAKEN, and its output 232 It is the Next Address, which contains the first address part and the second address part.
  • TAKEN When TAKEN is '1' (branch is successful), selector 208 selects track table 110 The output 230 (containing the first address and the second address of the branch target) is used as the output 232.
  • TAKEN When TAKEN is '0' (the branch is unsuccessful), the selector 208 selects the current first address 214. As the first address portion of the output 232, the adder output 228 acts as the output 232 second address portion.
  • Instruction type 224 is sent to leading zero counter 204 to calculate the next '1' '
  • the type of instruction (representing the corresponding instruction is a branch instruction) is the number of '0' instruction types (representing the corresponding instruction is a non-branch instruction), which is calculated as a bit regardless of whether the step bit is '0' or '1' 0 '.
  • the resulting number of leading '0's 226 (step number STEP Number) is sent to the second address of the output of the adder 206 and the register 210. Add to the next branch address ( Next Branch Address ) 228 .
  • the next branch source address is the second address of the next branch instruction of the current instruction, and the previous non-branch instruction is skipped by the tracker 114 (Skip).
  • the shifter controlled by the second address also places the track table 110.
  • the output of the multiple instruction types is uniformly shifted to the left.
  • the instruction type representing the instruction read by the track table 110 is shifted to the leftmost step bit in the instruction type 224.
  • Shift instruction type 224 The leading zero counter is sent to calculate the number of instructions before the next branch instruction.
  • the output 226 of the leading zero counter 204 is the step size that the tracker should advance. This step and the second address 216 are added by the adder After adding 206, the next branch instruction address 228 is obtained.
  • step bit signal in the instruction type 224 after shifting is '0'
  • the entry in 110 is a non-branch instruction, at which point the step bit signal control register 210 is updated, and the selector 208 selects the next branch source address under the control of the TAKEN signal 222 of '0'. 228 becomes the second address 216, the first address 214 constant.
  • New second address control shifter 216 will be instruction type 218 Shift, so that the instruction type bit representing the branch instruction falls to the step bit of 224 for the next operation.
  • step bit signal in the instruction type 224 after shifting is '1', this indicates the track table to which the second address points.
  • the entries in the table represent branch instructions.
  • the step bit signal does not affect the register 210 update, and the register 210 is controlled to be updated by the BRANCH signal 234 from the processor core.
  • Adder output 228 at this time Is the address of the next branch instruction on the same track of the current branch instruction, and the memory output 230 is the target address of the current branch instruction.
  • the output 232 of the selector 208 updates the register 210. . If the TAKEN signal 222 from the processor core is '0' at this time, the processor core decides to select the sequential execution at this branch point, and the selector 208 selects the next branch source address 228. . At this time, the first address 214 of the register 210 is unchanged, and the next branch source address 228 becomes the new second address. . At this point the new first and second addresses point to the next branch instruction in the same track. The new second address control shifter 216 shifts the instruction type 218 so that the instruction type bits representing the branch instruction fall to 224 The stepping position is for the next step.
  • the selector selects the branch target address 230 read from the track table 110 to become the first address output by the register 210. 214 and the second address in the future 226.
  • BRANCH signal 234 control register 210
  • the first and second addresses are latched to become the new first and second addresses.
  • the new first and second addresses point to branch target addresses that may not be on the same track.
  • New second address control shifter 216 will be instruction type 218 Shift, so that the instruction type bit representing the branch instruction falls to the step bit of 224 for the next operation.
  • the internal control signal controls the selector 208 to select the track table as described above.
  • the output of 110 is 230 and the register 210 is updated.
  • the new first address 214 is the track table 110.
  • the first address of the next track recorded in the end entry, the second address is zero.
  • the second address control shifter 216 shifts the instruction type 218 to the next bit and starts the next operation. So repeating, tracker 114, in conjunction with track table 110, skips non-branch instructions in the track table and always points to branch instructions.
  • FIG. 3 is an embodiment 300 of a level one instruction block, a level two instruction block, and an addressing relationship of the present invention.
  • the instruction address The length of 301 is 40 bits, that is, the highest bit is the 39th bit, the lowest bit is the 0th bit, and each instruction address corresponds to one byte (Byte). Therefore, the lowest two bits of the instruction address 301 302 (ie, bits 1 and 0) corresponds to 4 bytes in an instruction word ( Instruction Word ). It is assumed that in the present embodiment, the command line 301 is high 8
  • the bit is the process identification bit (PID) 310 which indicates which process is currently executing.
  • PID process identification bit
  • Pass process identification bit 310 It can be determined whether the currently executing process is stored in the instruction cache, and if not, prefetching can be performed through the entire row address 301, thereby avoiding the absence of the instruction in the instruction cache.
  • the process identifier bit 310 can also be omitted, and the length of the instruction address is 32 bits. For ease of explanation, the lower two bits 302 and the highest eight bits 310 of the instruction address are removed below, with the remaining The 30 bits (i.e., bits 31 to 2) constitute a new command line address 312 for explanation.
  • a level one instruction block contains 16 instructions, so the offset in the instruction line address 312 ( Offset ) 303 has 4 bits, which can be used to determine the position of an instruction in the level one block.
  • the offset 303 corresponds to the second address (BNY) described in FIG. Therefore, it is also possible to use this offset to determine which track point in the track table corresponding to the instruction.
  • the track table has 512 rows, then the first block number BNX1 has 9 Bit, whose value is determined by the line number in which it is located. Therefore, if a level one instruction block is filled from the level two instruction cache 106 to the level one instruction instruction cache 112 according to the requirements of the processor 116.
  • the branch target instruction for determining the branch instruction according to the foregoing method is already stored in the first level instruction cache 112, and then the corresponding first level block number stored in the active table 104 is added with an offset 303.
  • the tracks are written together in the track point of the track table corresponding to the branch source instruction, and when the processor core 116 executes the branch instruction, the instruction can be directly read from the first level instruction cache 112.
  • the tag bit 311 in the command line address 312 is stored in a tag array in a path group of the active table 104. In 118 or 120, it is used to compare with the target instruction address generated by the scanner 108 to obtain matching information. It is assumed that in the present embodiment, active table 104 and secondary instruction cache blocks 126 and 128 There are 1024 lines, then the corresponding index line 312 of the instruction line address 307 has 10 bits (ie, the 17th to the 8th bits). Index bit 307 It is used to retrieve which row of the secondary instruction cache is located in the secondary instruction cache, and can also be used to store the tags stored in the corresponding tag array 118 and tag array 120 in each path group of the active table 104 in the active table.
  • Each path group of 104 is read out corresponding to the valid value in the entry.
  • the block offset ( Block-offset) 306 has two bits, the sixth and seventh bits. Block offset 306 is used to select the store in the secondary cache 106
  • the first-level instruction block in the secondary instruction block in the middle is used to select which of the entries in the active table corresponds to a valid value. Therefore, the path group number of the secondary instruction cache 106 where the secondary instruction block is located plus the instruction line address 312
  • the index bit 307 constitutes a secondary block number BNX2.
  • a level one instruction block is filled from the level two instruction cache 106 to the level one instruction instruction cache 112 according to processor requirements
  • the branch target instruction that determines the branch instruction according to the foregoing method is not stored in the first level instruction cache 112 but is stored in the second level instruction cache 106, then the corresponding second level block number BNX2
  • the block offset 306 and the offset 303 of the branch target address of the branch instruction are added. Write the track table together with the branch source instruction in the track point of the track table, and wait until the tracker pointer points to the track point, and fill the corresponding level one instruction block from the second level instruction cache 106 into the level one instruction cache.
  • the first-level block number determined by the replacement policy (such as LRU) is in the first-level cache block pointed to by BNX1, and when the processor core 116 executes the branch instruction, it can directly cache from the first-level instruction 112. Read the instruction directly in .
  • the mapping relationship between the instruction in the level one instruction cache and the level two instruction cache can be established.
  • the offset 303 of the upstream address 312 is added to determine the location of the instruction in the primary instruction block stored in the primary instruction cache 112; and the block offset 306 in the instruction row address 312 is used. It is possible to determine the position of the primary instruction block in the secondary instruction block stored in the secondary instruction cache 106; the index bit 307 in the instruction line address 312 plus the cache path group number in which the secondary instruction block is located (ie, secondary Block number BNX2) can determine the location of the secondary instruction block in the secondary instruction cache 106.
  • the primary block number BNX1 and the secondary block number BNX2 do not have a necessary mapping relationship
  • the primary block number BNX1 is replaced by a level one instruction block from the second level instruction cache 106 into the level one instruction cache 112 by a replacement algorithm (such as LRU)
  • the algorithm determines, and indicates that the second address (BNY) of the location of the instruction in the level one instruction cache and the level two instruction cache is the same, that is, the offset of the instruction line address 312. .
  • the mapping relationship between the instruction in the level one instruction cache and the level two instruction cache can be established.
  • the target instruction address calculated by the scanner 108 can be matched with the instruction address stored in the active table, thereby obtaining matching information with the instruction address, and then the secondary block number BNX2 Or the first block number BNX1 is written to the track table to generate a new track.
  • Target instruction address 312 is described using a portion of the complete instruction address.
  • Target instruction address 312 includes tag bit 311, index bit 307, block offset 306, and offset 303.
  • Tag bits 311 are used with tags 302 and 304 in active table 104 For comparison, matching information is obtained; index bit 307 is used to retrieve which row in the active table corresponds to the address; block offset 306 is used to select the corresponding one-level instruction block in the secondary instruction block; offset 303 Used to determine the position of the target instruction in the level one command line, that is, to provide the second address (BNY).
  • the secondary instruction cache 106 is composed of two blocks 126 and 128.
  • the two blocks contain the same number of rows, that is, in the form of a two-way group.
  • the active list is also constructed in the form of a two-way set.
  • Active table 104 consists of first part tag arrays 118 and 120 And a second part of the storage blocks 408 and 410. The first part of the tag arrays 118 and 120 are used to match the branch target address calculated by the scanner 108, and the second part is used to store the level 1 block number. BNX1.
  • each row of each path group in the active table 104 corresponds to four entries 408 or 410.
  • the track table has the same number of rows as the active table, which is 1024 lines.
  • Each row of the level one instruction cache 112 contains 16 instructions, that is, the level one instruction block contains 16 instructions, so the track table 110 There are 16 entries in each row.
  • the primary instruction block fetched from the secondary instruction cache 106 is padded to the primary instruction cache according to the LRU replacement policy.
  • the level one instruction block contains three branch instructions located in clauses 4, 7, and 11 of the level one instruction block.
  • the value is assumed to be ' 1654 'The value stored in the 14th row of the way group 0 of the active list 104, the value '2526' is stored in the 14th of the way group 1 of the active list 104 In the label of the line.
  • the valid bit of the entry 2 of the row 14 of the active set in the active list is '1'
  • the valid bit of the entry 3 is '0'
  • the entry corresponding to the 14th row of the way group 1 The valid bit of 2 is ' 0 '.
  • the target instruction address of the first branch instruction is calculated as ' 1654
  • the index bit 307 will be stored in the active table. The two valid tags in the row are read, and the read tags are sent to the comparator 420 and the comparator 422, respectively, and the tag bit 311 of the branch target instruction address 312 calculated by the scanner 108.
  • the road group '0' matches successfully. Then, using the block offset bit 306 of the branch target address 312, the corresponding entry 2 in the active table is selected, and the valid bit is '1. ', then the value '5' stored in it is written to the third row and fourth entry in the track table, and the value of the offset (BNY) '3' is written to the third row in the track table. 4 In the entry, '5
  • the scanner 108 calculates that the target address of the second branch instruction is '1654
  • the tag bit and the index bit are consistent with the foregoing values
  • the value of the block offset 306 is '3'
  • the value of the offset 303 is '5'.
  • the 14th of the path group 0 in the active table is selected by the above method.
  • the valid bit of entry 2 is '0', indicating that the branch instruction is not in the level one instruction cache 112, then the path group number in the active table is added to the target instruction address.
  • the index bit 307 of 312 is written into the track table as a secondary block number (BNX2) and the block offset 306 and offset (BNY) 303 values, ie 0
  • ' 0 ' indicates that the instruction corresponds to the way group 0 of the active table
  • ' 14 ' indicates that the target instruction is in the 14th of the corresponding active list.
  • Line, '3' indicates that the instruction is in the third entry of the corresponding active table, and '5' indicates that the instruction corresponds to the fifth instruction in the primary instruction block.
  • the scanner 108 calculates the target address of the third branch instruction is ' 3546
  • the foregoing method cannot successfully match any one of the active tables, indicating that the instruction is not in the secondary cache, and the corresponding instruction block is taken into the secondary cache 106 according to the target address, according to the LRU. Replace the algorithm and fetch the instruction block into the 14th row and 2nd entry in the L2 of the L2 cache.
  • the replacement algorithm can also use existing algorithms such as a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm (Rand).
  • FIFO first in first out algorithm
  • LRU least recently used algorithm
  • Rand random replacement algorithm
  • the value stored in the track point is read out as '5
  • the target instruction address of a branch instruction is '1654
  • the instruction has been executed, indicating that the instruction has been populated into the level one instruction cache 112.
  • the read pointer of the tracker 114 points to the third row and the seventh entry of the track table
  • the value stored in the track point is read out' 0
  • the entry indicates that the primary block number BNX1 stored in the entry is valid. Then according to the first block number BNX1 directly from the first level cache The instruction is read in the row and no longer needs to be read from the secondary cache.
  • the primary block number value '9' stored in the entry is written in the third row and the seventh entry of the track table, that is, on the third row of the track table 110.
  • the 7 item stores a value of '9
  • the processor core 116 can directly from the first instruction cache 112 The instructions are read directly from the 9 lines for use by the processor core 116.
  • the value stored in the track point is read out' 1
  • the primary block number BNX1 stored in the second entry of the row is invalid, indicating that the corresponding branch target instruction is not in the primary instruction cache 112. Therefore, it will be stored in the secondary cache 106
  • the corresponding first-level instruction block is filled into the first-level block number determined by the replacement algorithm.
  • the value of BNX1 is 38.
  • the first-level instruction block pointed to is stored in the second-level instruction cache 106.
  • the corresponding one-level instruction block is filled in the 38th line of the first-level instruction cache 112, and the value '38' is written into the 14th row and the second entry of the way group 1 of the active table, and the active table 104
  • the valid bit of the 14th row and 2nd entry of the middle group 1 is set to '1', and the value '38
  • the active table and track table update are completed.
  • the replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm ( Random ) and other existing algorithms.
  • FIFO first in first out algorithm
  • LRU least recently used algorithm
  • Random random replacement algorithm
  • a storage domain of the road group number in the secondary block number of the previous two-level instruction block of the sequential address of the secondary instruction block corresponding to the secondary instruction block corresponding to the entry may be added to the entry of the active table.
  • P the storage domain of the road group number in the secondary block number of the secondary instruction block after the sequential address N .
  • the corresponding secondary block number reads out the path group number of the corresponding previous or next secondary instruction block from the active table, and the result of the path group number corresponding to the index bit corresponding to the examined branch instruction is decreased by one or increased by one.
  • the splicing can obtain the corresponding secondary block number of the previous or next secondary instruction block, thereby avoiding the operation of sending the branch target instruction address to the active table for matching.
  • the scanner reviews a level one instruction block (referred to as a current level one instruction block), if the current level one instruction block is in the second level instruction block (referred to as the current level two instruction block) The last level one instruction block, then the end track point corresponding to the current level one instruction block is established as before. If the secondary instruction block (hereinafter referred to as the next secondary instruction block) where the next level one instruction block is located in the sequential address of the current level one instruction block is already stored in the second level buffer, the subsequent second level instruction block is directly corresponding to The second block number is filled in the end track point as the track point content; if the latter second level instruction block is not yet stored in the second level cache, the latter second level instruction block is filled as described above.
  • a current level one instruction block if the current level one instruction block is in the second level instruction block (referred to as the current level two instruction block) The last level one instruction block, then the end track point corresponding to the current level one instruction block is established as before. If the secondary instruction block (hereinafter referred to as the
  • the position in the secondary cache determined by the replacement algorithm, and the corresponding secondary block number is filled into the ending track point as the track point content.
  • the secondary block number of the second level instruction block of the current secondary instruction block is the second block number of the second level instruction block, and the path group in the second level block number may be used.
  • the storage area in the active table entry pointed to by the secondary block number corresponding to the current secondary instruction block (referred to as the current secondary block number) N
  • the secondary block number of the previous secondary instruction block of the subsequent secondary instruction block is the current secondary block number, and the path group number in the secondary block number can be used as the storage domain.
  • the content is filled in the storage domain in the active table entry pointed to by the secondary block number corresponding to the second level of the instruction block. P.
  • the index bits differ by ' 1 ', therefore, the index bit of the secondary instruction block address can be decremented by one and incremented by one, thereby obtaining the index bit value of the previous two-level instruction block and the latter second instruction block of the sequential address of the second-level instruction block, and
  • the content stored in all the way groups of the corresponding location is read from the active table according to the calculated index bit value. Then compare all the tags in the read content with the tags of the current secondary instruction block.
  • the path group number in the matching entry is Can be used as the storage domain content to fill the storage domain in the active table entry pointed to by the current secondary block number.
  • the road group number in the current secondary block number is filled as the storage domain content to the storage domain in the matching entry.
  • the path group number in the matching entry may be filled as the storage domain content to the active table table pointed to by the current secondary block number. Storage domain in the item N And filling the path group number in the current secondary block number as the storage domain content into the storage domain P in the matching entry.
  • FIG. 5 is another embodiment 500 of the second level cache of the present invention in the form of a two-way group.
  • the target instruction address 312 is described using a portion of the full instruction address.
  • a level one instruction block contains 4 instructions, so the offset 303 in the instruction line address 312 has 2 Bit, with this offset, determines the position of an instruction in the level one instruction block, called BN1Y.
  • the track table has 128 rows, then the first block number BN1X (that is, the aforementioned BNX1) has 7 Bit, whose value is determined by the line number in which it is located.
  • BN1X is spliced and BN1Y is called BN1, so that the position of the instruction in the level one instruction cache 112 can be determined.
  • block offset 306 is 2 bits.
  • Block offset 306 The stitching offset 303 is called BN2Y.
  • the index bit 307 is The 10-bit number plus the corresponding road group number is called the secondary block number BN2X (consistent with the aforementioned BNX2).
  • the structure of this embodiment is basically the same as that in FIG. 4, and the only change is the active table 104.
  • Each row in the table adds an entry that stores the address of the previous instruction block and the address of the next instruction block of the instruction block represented by the row, and a selector that serves the above-mentioned new entry.
  • Each row in the left-hand array (representing a L2 cache block) stores four entries of the four L1 cache block addresses in the row, in addition to the original storage tag entry 118 in FIG.
  • an entry 501 storing the address of the previous L2 cache block in the order of addresses and an entry 503 storing the next L2 cache block address in the order are added.
  • the output of the left array, entry 408 The output is still selected by the original selector 521, the output of the selector 521 and the output of the newly added entries 501, 503 are additionally added by the selector 531.
  • the right side array adds an entry 501 storing the previous L2 cache block address, and an entry 503 storing the next L2 cache block address, and a selector 532 corresponding to the selector 531. .
  • comparator 420 controls a three-state gate to put the output of selector 531 on the bus to the track table.
  • the memory is stored in 110; the comparator 422 controls another tri-state gate to put the output of the selector 532 on the same bus and store it in the track table 110.
  • Label 118 , label 120 The result of comparison with the input address determines which selector output (which instruction address) is sent to the track table 110 for storage.
  • the index address of the previous or next secondary instruction block of the current secondary instruction block can be decremented by 1 or plus 1 by the index address of the current secondary instruction block (ie, 307 in Figure 4). Obtained, then the entries 501, 502 of the previous block address and the entries 503, 504 of the last block address are added. In this case, it is only necessary to store the way number of the way group of the previous or next secondary instruction block of the current secondary instruction block.
  • the 'branch source instructions' are all direct branch instructions unless otherwise specified.
  • the scanner 108 pairs from the secondary cache 106 to the primary cache 112
  • the secondary instruction sub-block is reviewed.
  • the branch target address of the branch source instruction is calculated.
  • the scanner 108 In order to reduce power consumption, that is, to reduce the number of accesses of the active table 104, in the scanner 108 The method for determining whether the location of the branch target instruction exceeds the level of the first instruction block, the current level of the second instruction block, and the boundary of the previous or next level of the second instruction block of the current level 2 instruction block reduces the access to the active table. Frequency of.
  • the address boundary determination of the branch target address is determined by adding the branch address offset to the lower order of the base address.
  • the branch offset (OFFSET) 571 is added to the base address low 581, and the carry signals are extracted from the three boundaries of the adder (574, 575, and 576).
  • the three signals are processed with priority logic such that an effective 'in-bound' signal representing the largest block of data would invalidate the in-boundary signal representing the smaller block of data.
  • the base address lower bit 581 is divided into three parts, and the first part is the offset of the base address 311.
  • the second part is the block offset 306, and the third part 579 is one bit higher than the block offset 306 in the address.
  • the branch offset 571 is divided into two parts, and the lower part 573 corresponds to the base address 311
  • the lower part of the 581, the remaining part is the high part 572.
  • the generated sum value 582 is divided into three parts according to the same boundary as the base address, and a carry signal 574 is generated on each boundary. 575 and 576.
  • the method for determining the address boundary judgment is as follows:
  • branch portion offset 571 has the upper part 572 all '0' and the carry signal 576 is '1' ', indicating that the branch target address is outside the second level of the instruction block of the second instruction block where the branch source instruction is located. This situation is consistent with case 1, also known as case 1.
  • the address boundary determination can also be determined according to the foregoing method. The difference is that it is first determined whether the upper portion 572 of the branch offset 571 is all '1'. If branch offset 571 The upper part 572 is not all '1', which is the first case; if the high part 572 of the branch offset 571 is '1', and the carry signals 574, 575 and 576 Both are '0', which is the case 2; if the upper part 572 of the branch offset 571 is '1', and the carry signal 574 is '1', and the carry signals 575 and 576 It is '0', which is the above case 3; if the upper part 572 of the branch offset 571 is '1', and the carry signal 575 is '1', and the carry signal 576 is '0' ', that is, the above case 4; if the upper part 572 of the branch offset 571 is all '1', and the carry
  • the branch target instruction address is calculated by the BN1X base address and the PC address of the instruction segment temporarily stored in the scanner, and the calculated branch target location is as follows.
  • the scanner 108 reviews the address boundary determination condition 1
  • the scanner 108 The calculated branch target instruction address is sent to the active table 104 via the bus 507, the corresponding row is read according to the index bit therein, and the read tag and the scanner 108 are read.
  • the calculated label of the branch target instruction address is matched. If the matching is successful, the subsequent operations are consistent with the foregoing. If the matching is unsuccessful, the calculated branch target address is taken from the lower level memory and the corresponding instruction block is filled into the second level cache block determined by the replacement policy, and the subsequent operations are consistent with the foregoing.
  • the scanner 108 examines the address boundary judgment condition 2
  • the branch target address and the branch source address are in the same level one instruction block, that is, the branch target instruction and the branch source instruction have the same BN1X.
  • force the tristate gates of all the way groups such as the three-state gate 541 And so on, and the branch source BN1X stored in the scanner and the calculated offset 303 (i.e., branch target BN1Y) are merged into BN1 and written by the bus 505 into the scanner 108.
  • the processor 116 can directly cache from the level 1 when the branch source instruction is to be executed.
  • the direct read instruction is used by the processor 116.
  • the scanner 108 examines the address boundary judgment situation 3 When the branch target address and the branch source address are in the same level two instruction block, that is, the branch target instruction and the branch source instruction have the same BN2X. At this point, the BN2X of the instruction block where the branch source instruction stored in the scanner is located
  • the second storage block (such as the second storage block 408 or 410) in the corresponding entry of the active table 104 is read out via the bus 507 index (including the way group number and the index bit), and the calculated block offset 306 is used ( Block-offset) selects the content of the corresponding storage domain in the second storage block.
  • branch source BN2X Forcing the branch source BN2X if the BN1X value stored in the storage domain is valid
  • the tri-state gate of the road group corresponding to the middle road group number is turned on, the tri-state gate of the other road group is turned off, the BN1X value is sent to the track table 110 via the bus 508, and the calculated branch target BN2Y is also calculated.
  • the branch target BN1Y obtained after the block offset 306 is removed is sent to the track table 110 via the bus 505, and the two are merged into a branch target.
  • BN1 is written into the track table 110 by the scanner 108.
  • the branch source in the temporary storage BN1X and BN1Y points to the table entry.
  • BN1X value stored in the storage domain is invalid
  • the tristate gates of all the way groups are forced to be turned off and stored in the scanner 108.
  • the branch source BN2X and the calculated branch target BN2Y are merged into BN2 and written to the branch source BN1X temporarily stored by the scanner 108 in the track table 110 via the bus 505.
  • BN1Y points to the table entry. Subsequent operations are consistent with the foregoing.
  • the scanner 108 examines the address boundary judgment situation 4
  • the index bit value of the branch target instruction is different from the index bit value of the branch source instruction by ⁇ 1 '(The index bit value of the previous two-level instruction block is different from the index bit value of the branch source instruction' -1 ', and the value of the index bit of the latter two-level instruction block is different from the value of the branch source instruction index bit' +1' ').
  • the branch source BN2X (including the path group number and the index bit) stored in the scanner is read out via the bus 507 index to read the third storage block in the corresponding entry of the active table 104 (such as the third storage block 501, 502 or 502, 504), and according to the address boundary determination result, when the branch target address is in the previous two-level instruction block of the branch source address, select the corresponding storage domain P (such as the third storage block 501 or 50), when the branch target address is in the next secondary instruction block of the branch source address, select the corresponding storage domain N (such as the third storage block 503 or 504) Medium).
  • Number bus 508 The value of the new index bit obtained by decrementing or incrementing the branch source index bit stored in the scanner 108 and the calculated branch target BN2Y value are sent to the track table 110 via the bus 505. It is sent to the track table 110, and the two are merged into a branch target. BN2 is written into the branch source BN1X and BN1Y temporarily stored in the scanner 108 in the track table 110. Point to the table entry. If the path group number stored in the selected storage domain is invalid, the branch target address calculated according to the scanner 108 is sent to the active table via the bus 506. The index matching is performed, and the subsequent operations are consistent with the operation of the foregoing address boundary judgment case 1.
  • the frequency of reading the tag in the active table 104 compared to the address is reduced, but in cases 2 and 3 It is also necessary to directly find the 408, 410 entries in a row in the active table 104 in FIG. 5 by the way group number and the index address 307 to obtain the first instruction address in the same secondary instruction block, or 501. , the previous second address in the 502 entry, or the next second address in the 503, 504 entry. If the scanner 108 scans from the lower layer buffer 126 or 128 to the upper level buffer When the instruction block is filled, the above table entry in the active table (104) row corresponding to the instruction block (same group number and same index address as the above instruction block) is filled into the scanner 108.
  • Temporary storage can further reduce the frequency of access to the active table 104.
  • the register in the register has a plurality of independent read ports, and the plurality of branch instructions in the instruction segment being scanned can simultaneously determine the situation according to the address boundary of the respective branch target instruction.
  • the BN1 or BN2 form address of the instruction branch target is independently mapped by accessing the read port assigned to the instruction for storage in the track table 110.
  • Figure 6 is another embodiment 600 of a scanner constructed in a level two cache structure of the present invention.
  • the upper level buffer 112 Each instruction block contains 4 instructions, that is, the offset 303 BNY address is two bits; the lower layer buffer 126 or 128 each cache block contains 4 high-level cache blocks, that is, block offset 306 The address is also two.
  • One row of the track table 104 corresponds to a lower layer cache block, and the row contains four entries of the BN1X address stored in 408, and such as 501 An entry for storing the path group number of the previous lower layer address block, and an entry for storing the path group number of the next lower layer address block as 503.
  • the instructions are to the upper level buffer 112.
  • the scanner 108 also includes a miniature active block 660. The entire scanner 608 can be substituted into the scanner 108 of Figure 5, and the other top structures are the same as in Figure 5. Only the track table 110 is shown in the figure.
  • 660 also contains 624, 625 and 626 3 entries, where 624 entries store the path group number of the previous lower level cache block of the 501 entry in 104, and 625 entries store the path group number and index address of the current lower layer cache block, 626
  • the entry stores the path group number of the next lower-level cache block in the 503 entry in 104.
  • the 625 entry stores the secondary address path group number and index address of the instruction block being scanned. And read into the scanner 608 at the same time as the instruction block.
  • selectors in the micro active block 570-574 of which 4 570-573
  • the same structure according to the corresponding decoding and judging the address boundary generated by the sub-block to determine the content of the selection table item 630-636 directly or after the operation to generate a BN1X or BN2X address, together with an adder such as 607
  • the calculated address offset 303 is stored as a branch target address of the scanned instruction in the entry corresponding to the scanned instruction in the track table.
  • 5th selector 574 selection entry 630-636 The content fills the end track point in the track, and its selection control is different from that of selectors 570-573.
  • the instruction decoder in the sub-block decodes the instruction it is responsible for. If the instruction type is not a branch instruction, the instruction type generated by the sub-block decoding is stored in the track table and corresponds to the instruction. The entry of the table, the scanner does not calculate the branch address for the instruction. If the instruction type is a branch instruction (this instruction is hereinafter referred to as a branch source instruction), Then, the sub-block generates an address boundary judgment as in the previous example, for selecting the branch target address, and filling the track table together with the instruction type generated by the decoding. The entry corresponding to the branch source instruction. The following example shows the case when the sub-block decoded instruction is a branch instruction.
  • each sub-block first performs a total of '0' in the branch offset 571 of the instruction. 'The judgment, if not all '0', the address boundary is judged as case 1.
  • the instruction branch offset is added to the base address of the instruction.
  • the base address is the temporary buffer in the scanner (from the active table 104 In the 408 entry, the index value (that is, 307 in the 625 entry), the block offset 306, and the offset 303BNY merge.
  • the instruction block has 4 instructions each with its first base address. The parts are the same, only BNY is different.
  • the BNY of the first instruction in the order of instructions is '0', and the BNY of the following three instructions is '1', '2', '3 '.
  • the summed sum is the memory address of the branch target, and each row in the left and right arrays in the active list 104 is read with the index portion 307 in this memory address as the address. Offset by block in memory address 306
  • the control selector 521 selects the BN1X stored in one of the four entries 408 in the row, and is selected by the selector 531 (fixed selection in the address boundary judgment case 1 521) The output is sent to the tri-state gate 541.
  • the label entry in the row 118 and the label portion 311 in the memory address of the branch target are in the comparator 420 In the comparison, if the result is the same, the same comparison result enables (enable) the three-state gate 541, the output of the tri-state gate 541 and the 303 offset in the memory address BNY The corresponding entry to the scanned instruction in the track table is merged and stored. If it is the label entry 120 of the right array and the label portion 311 of the memory address of the branch target is at the comparator 422 If the result of the comparison is the same, the BN1X that is sent to the address stored in the track table is from the entry 410. The principle is the same and will not be described again. The following describes the case where the high bit in the branch offset is all '0'.
  • Each of the decoding and decision sub-blocks will have a branch offset 571 of the branch instruction responsible for processing and a block offset 306 in the instruction base address.
  • offset 303 is added by an adder in the sub-block such as 607 (the higher bit in the base address such as index bit 722, label 721 Deprecated).
  • Each sub-block generates an address boundary judgment according to the carry signal generated at the time of addition according to the foregoing method, and determines a generation control signal according to the address boundary to control the selector to select an appropriate memory entry 620-626 The value in is used to populate the track table.
  • the offset 303 (The offset 303 is '0' for the first instruction in the sequence) is added in the adder 607. If the address boundary of the sub-block is determined as the case 1 Then, as described above, the memory address of the branch target is calculated by the scanner 608 to be sent to the active table 104 and mapped to the level 1 cache address BN1 and stored in the branch source instruction corresponding entry in the track table.
  • the address boundary judges the block offset in the sum generated by the adder 607. 306 Place control line 610 to control selector 670. If the block offset 306 is '00', the selector 670 selects the storage entry 620. In the content, if the valid bit of the content is 'valid', the selector 670 outputs the BN1X address in the storage entry 620; when the valid bit in the content in the storage entry 620 is 'invalid', the selector 670 Outputs the path group number stored in the storage entry 625, index bit 307.
  • the path group number of the output, the index bit 307 and the block offset 306 in the sum generated by the adder 607, the offset 303 (BNY) is merged and sent to the first entry in a track in track table 110.
  • the track is the corresponding track of the level one cache block in the level one buffer into which the instruction block being scanned is stored.
  • Adder 607 When the generated block offset 306 is '01', '10', and '11', the selector 670 selects the storage entries 621, 622, and 623 accordingly. If the content is invalid, then the entry 625 is selected, which is the same as above.
  • control line 610 controls the selector. 670 Select storage table entry 624 Read the path group number in it, select the storage table entry 625 and read out the index 307. The upper block group number of the entry 624, the index in the entry 625 307 Subtracting '1', the block offset 306 generated by the adder 607, and the offset 303 are merged into the first entry in the above track in which the BN2 address is stored.
  • the control line 610 controls the selector 670 to select the storage table entry 626 to read out the path group number therein, select the storage table entry 625, and read the index therein. 307. With the lower block group number of the entry 626, the index 307 in the entry 625 is incremented by '1', the block offset 306 of the sum generated by the adder 607, and the offset 303 are merged into The BN2 address is stored in the first entry in the above track.
  • the sub-blocks also operate independently on the respective instructions in the above manner, independently determine the address boundary of the instruction, and control the selectors 671, 672 via the control lines 611, 612, 613 according to the judgment. , 673 Select the contents of the storage table item 620-626, and fill in the 2, 3, and 4 items in the track, respectively, in response to the sum generated by the adder in the sub-block.
  • the last entry in the track, the end track point, is filled by the output of selector 674.
  • the selector is directly controlled by the block offset 306 in the base address 614 of the instruction segment.
  • the selector 674 selects the storage entry 621.
  • the selector 674 When storing the entry 621
  • the selector 674 When the valid bit in the field is 'valid', the selector 674 outputs the BN1X address in the storage entry 631; when the valid bit in the contents of the storage entry 621 is 'invalid', the selector 674
  • the path group number stored in the storage table entry 625 is output, index bit 307.
  • the output is offset from the block offset 306 generated by adder 607 by '1', offset 303 (BNY) After the merge, it is sent to the end entry in a track in the track table 110.
  • the selector 674 selects the storage entry accordingly. For the contents of 622 and 623, if the content is invalid, the entry 625 is selected, which is the same as above.
  • the selector 674 selects the storage entry 626 Read the path group number and select the storage entry 625 to read the index 307.
  • the index 307 in entry 625 is incremented by '1', the adder 607
  • the generated block offset 306, the offset 303 is merged into the end track point entry in which the BN2 address is stored in the above track.
  • the active list 104 It can also be configured by multi-port read/write to achieve simultaneous access to the active table by multiple branch target addresses.
  • Figure 7 shows the memory and format used in the micro-track table organized in a fully associative manner.
  • Figure 7A is the structure of a memory 820 in a fully associative micro track block.
  • the memory 820 contains six entries, corresponding to a secondary instruction block containing four primary instruction blocks.
  • the entry 710 There is a first-order instruction block number BN1X corresponding to the first-order instruction block whose displacement in the block is '00' in the second-level instruction block and its valid signal; the entries 711, 712, and 713 respectively have the intra-block displacements respectively. 01 ', ' 10 ', '11 '
  • Entry 714 contains the way group number ( Way number ) and index address 307 ( Index ).
  • Entry 715 stores the path group number of the lower level L2 cache block.
  • module 110 is a track table
  • module 808 For the scanner, the scanner 108 in Figure 5 can be referred to.
  • the function block 801 is similar to the instruction decoding and judging module 601 in the embodiment of FIG. A function block for performing independent instruction decoding and calculating a branch target address for a plurality of instructions in an instruction block of the scanner.
  • the function block 801 sets each decoding result as the instruction base address of the branch instruction (Base Address, as described in Figure 6, the high order of the base address of the complex instruction is the same, but in this case the lowest two bits of the base address differ from the position of the instruction in the instruction block) and the branch offset of the instruction ( Branch Offset, which is the branch address offset, is added, and the sum is the branch target address, and the selection of the content of the micro active block 881 is controlled by this address.
  • Base Address as described in Figure 6, the high order of the base address of the complex instruction is the same, but in this case the lowest two bits of the base address differ from the position of the instruction in the instruction block
  • Branch Offset which is the branch address offset, is added, and the sum is the branch target address, and the selection of the content of the micro active block 881 is controlled by this address.
  • these branch target addresses can be divided into 4 In part, the descending order is from the high position to the low position, which are a micro-tag portion ( Tag ) 721 , a micro-index ( Index ) 722 , and a block offset ( Block Offset ) 306 . And the offset 303.
  • Micro-label 721, micro-index 722 is different from label 311, index 307 in other embodiments of the present disclosure. Where the micro index 722 There are only two digits, because each micro active block contains only 4 active table rows corresponding to the second level instruction block, and there are 4 level one instruction blocks in the corresponding level one instruction block, and the micro index value is equal to the active table index 307 The lowest two.
  • Microlabel 721 contains labels 311 and active table index 307 bits other than the lowest two bits.
  • the first three parts 721, 722 and 306 are via buses 810, 811, 812, 813 is sent to each micro active block (such as micro active block 881, 883) to control the selector; offset 303 is combined with the corresponding selector output BNX into a complete BN
  • the address is to populate the entries in the track table 110.
  • the micro-active block 881 contains memories 820, 821, 822, 823 and multiplexers that store track table entries. 870, 871, 872, 873, 874.
  • the memory such as memory 820 is the structure in Figure 7A.
  • the micro active block 881 has a microtag register 851 in which the micro active block 881 is stored. The base address of a consecutive instruction corresponding to the active table entry stored in it. There are also 4 comparators 860, 861, 862, 863. One input and register of the four comparators 851 The output is connected, and the other input is connected to the above four branch target addresses 810, 811, 812, and 813, respectively. 4 branch destination addresses 810, 811, 812, The 813 is sent to the micro active blocks 881, 883 (the same structure as the micro active block 881) and compared to the micro tags in the microtag registers. In the micro active block 881, the branch destination address is set.
  • the 810 micro-label portion 721 is compared by the comparator 860 and is the same as the micro-tag in the micro-register 851. Comparator 860 with branch target address 810 micro index 307 and block offset 306 Control selector 870.
  • the micro index 307 selects one of the four memories. When the micro index is '00', select 820, and when the micro index is '01', '10', ' 11 ', select memory 821, 822, 823 respectively.
  • Block offset 306 selects 4 groups of BN1X from the selected memory.
  • selector 870 When the valid bit in the selected group is 'valid', the selector 870 outputs the BN1X address in the selected group; when the valid bit in the selected group is ' Invalid ', selector 870 Output Memory 820
  • the output is coupled to the same output from another micro active block 883 via an OR gate 840 or operated, and combined with an offset 303 from adder 607 to the track table 110.
  • the first entry in the track pointed to by address bus 505 is written.
  • the micro tag portion 721 in the branch target address 811 is set via the comparator 861. In comparison, the result is different from the micro-tag in the micro-tag register 851, at which time the comparator 861 sends a signal control selector 871 to output all '0'. 'Output so that it does not affect the corresponding output in other micro active blocks (such as micro active block 883).
  • the branch target 811 is sent to the active table.
  • the branch target address is read. 811 The table entry is read. The entry is filled in the track table by the address bus. The second entry on the track pointed to by 505.
  • the remaining two branch target instruction addresses 812, 813 each control selectors 872, 873 select 16 BN1 1; or the way group number and index bit 307, together with the block offset 306 on the target instruction address; or all '0' output.
  • the output is merged with the corresponding BN1Y, with the micro active block 883 After the corresponding output is performed or operated, it is sent to the track table 110 to write the 3, 4 entries of the above track storage. If an instruction is not a branch instruction, the instruction decode controls the corresponding comparator of the instruction without comparison, such as an instruction.
  • the non-branch instruction type generated by the instruction decoding is stored in the third entry of the above track in the track table 110.
  • the lower block address stored in the end track point in the track is provided by a similar comparison by a decoding selection function.
  • Selector 874 and memory The connection method of the 820, etc. is different from that of the selector 870-873. Under the same address control, the selector 874 selects the input of the next address in the order of 870-873. If the microindex 722 of the address and the block offset 306 bit are '0000', the selector 870-873 selects the entry 710 in the memory 820, but according to the same address selector 874 The entry 711 in the memory 820 is selected; if the micro index 722 of the address and the block offset 306 are '0011', the selector 870-873 selects the entry in the memory 820.
  • the selector 874 selects the entry 710 in the memory 821. If the micro index of the address and the block offset 306 are '1111', it is special, the selector 870-873 The entry 713 in the memory 823 is selected, but the selector 874 selects the way group number in the entry 715 in the memory 823 and the second instruction block number in the entry 714 plus '1'. Together with the block offset 306 in the address as the lower block address.
  • the micro-tag 721 in the block address 814 i.e., the base address of the instruction block being processed) is sent to each micro-active block for comparison with the micro-tag stored therein.
  • the comparator 864 in the micro active block 881 compare the output of the pico and microtag registers 851 on the block address 814, and the comparison result is the same, then the comparator 864 takes the block address 814. Index 722 and block offset 306 on the top control selector 874.
  • the entry is output.
  • selector 874 selects the way group code in table entry 724 in the memory 823, index address 307 and block offset 306 on address 814. Output together.
  • the address format 760 is a level 1 cache address format, which is composed of BN1X 761 and offset BNY 303. Where address format 780 It is a secondary cache address format consisting of way group number 781, index address 307, block offset 306, and offset BNY 303.
  • micro-tag in 814 does not match all the micro-active blocks (such as micro-active blocks 881, 883) in the scanner 800, and the branch target address 811 is sent to the active table in this embodiment.
  • the row pointed to by the branch target address 811 can be padded to the permutation logic (such as the LRU, by the scanner 800).
  • the memory specified by the micro index bit 722 in the branch target 811 in a designated micro active block is replaced by the original entry, such as when the micro index bit is '10'.
  • the method is to fill in four BN1Xs and their valid signals in a row in the active table 104 pointed to by the branch target 811 into the entry. 710, 711, 712, 713; the path group number and index number 307 of the active table row are filled in the entry 714 as the L2 cache block number of the block; the lower block entry in the active table row is 503.
  • the road group number in the middle is filled in the entry 715.
  • the micro-tag in the branch target 811 is stored in the micro-tag register 851 in the micro-active block 883; and the memory 820, 821, 823
  • the valid position in is 'invalid'. Thereafter, each entry in the memory 820, 821, 823 can be updated during the period in which the active list is not accessed.
  • the permutation logic can specify a micro-active block as a permutation object according to a specific algorithm.
  • LRU For example, in each micro active block, there is stored a count value with complex bits whose lowest bit is on the right. The count value is shifted 1 bit to the left whenever any of the comparators in the block match, and is filled with '1' at the lowest bit. '.
  • the replacement logic observes the count value in all the blocks. If the lowest bit of any one of the count values is '0', the micro active block where the count value is located is the replaced object.
  • the replacement logic controls the count values in all the micro active blocks to be shifted to the right by one bit until one of the lowest value of the count value is '0', that is, the micro active block where the count value is located is the replaced object.
  • the present invention can also support the scanner 108 with a micro-active block of a group-connected structure. All instructions in one instruction block being scanned are simultaneously address mapped.
  • the micro-active block of the group connected structure is similar in structure to a reduced active table 104, such as the number of columns, but the list is the same but only 8 rows, and there are 4 A read port corresponds to a maximum of 4 instructions in an instruction block. Each read corresponds to an entry in the track table 110.
  • the selectors 521, 531, the comparator 420, and the three-state gate 541 in FIG. And so on are 4 sets.
  • the four branch addresses of the four branch instructions are used to address the micro active blocks of the group connected structure.
  • the read port reads out 8 lines of contents, 8 of which are BN1X addresses each of which has a block offset of 4 branch addresses 306. One is selected from each group; 8 micro-labels (compared to label 311) Long, including the bits other than the lowest 3 bits in index 307) are compared with the micro-tags in the 4 branch addresses in 8 comparators.
  • One of the two channels in the same way to compare the results of the same drive 3
  • the state gate writes the BN1X selected by the above 306 in the path of the read port to the entry corresponding to the read port in the track table.
  • Each of the four read ports writes one entry in the track.
  • the apparatus and method proposed by the present invention can be used in various cache related applications, and the efficiency of the cache can be improved.

Abstract

La présente invention concerne un système et un procédé destinés à la mise en cache d'une instruction de haute performance s'appliquant au domaine des processeurs, qui peuvent exécuter, avant qu'un cœur de processeur exécute une instruction, l'instruction dans une mémoire haute vitesse qui peut être directement accessible par le noyau de processeur, afin que le noyau de processeur puisse acquérir une instruction nécessaire provenant de la mémoire haute vitesse presque à chaque fois, atteignant ainsi un taux de réussite élevé.
PCT/CN2014/085063 2013-02-08 2014-08-22 Système et procédé destinés à la mise en cache d'instruction de haute performance WO2015024532A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/913,837 US20160217079A1 (en) 2013-02-08 2014-08-22 High-Performance Instruction Cache System and Method
US15/722,814 US10275358B2 (en) 2013-02-08 2017-10-02 High-performance instruction cache system and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310379657 2013-08-23
CN201310379657.9 2013-08-23

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US14/766,754 Continuation US20150378935A1 (en) 2013-02-08 2014-01-29 Storage table replacement method
PCT/CN2014/071812 Continuation WO2014121740A1 (fr) 2013-02-08 2014-01-29 Procédé de remplacement de table de stockage

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US14/913,837 A-371-Of-International US20160217079A1 (en) 2013-02-08 2014-08-22 High-Performance Instruction Cache System and Method
US15/722,814 Continuation US10275358B2 (en) 2013-02-08 2017-10-02 High-performance instruction cache system and method

Publications (2)

Publication Number Publication Date
WO2015024532A1 WO2015024532A1 (fr) 2015-02-26
WO2015024532A9 true WO2015024532A9 (fr) 2015-04-23

Family

ID=52483095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/085063 WO2015024532A1 (fr) 2013-02-08 2014-08-22 Système et procédé destinés à la mise en cache d'instruction de haute performance

Country Status (2)

Country Link
CN (1) CN104424132B (fr)
WO (1) WO2015024532A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294352B (zh) * 2015-05-13 2019-10-25 姚猛 一种文件处理方法、装置和文件系统
US9431070B1 (en) 2015-08-31 2016-08-30 National Tsing Hua University Memory apparatus
CN105389270B (zh) * 2015-12-22 2019-01-25 上海爱信诺航芯电子科技有限公司 一种提高片上系统指令缓存命中率的系统及其方法
CN112905528A (zh) * 2021-02-09 2021-06-04 深圳市众芯诺科技有限公司 基于物联网的智能家居芯片

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897655A (en) * 1996-12-10 1999-04-27 International Business Machines Corporation System and method for cache replacement within a cache set based on valid, modified or least recently used status in order of preference
CN101552032B (zh) * 2008-12-12 2012-01-18 深圳市晶凯电子技术有限公司 用较大容量dram参与闪存介质管理构建高速固态存储盘的方法及装置
US8904156B2 (en) * 2009-10-14 2014-12-02 Oracle America, Inc. Perceptron-based branch prediction mechanism for predicting conditional branch instructions on a multithreaded processor
EP2517100B1 (fr) * 2009-12-25 2018-09-26 Shanghai Xinhao Micro-Electronics Co. Ltd. Système et procédé de mémoire cache hautes performances
US8635408B2 (en) * 2011-01-04 2014-01-21 International Business Machines Corporation Controlling power of a cache based on predicting the instruction cache way for high power applications
US8756405B2 (en) * 2011-05-09 2014-06-17 Freescale Semiconductor, Inc. Selective routing of local memory accesses and device thereof
CN102841865B (zh) * 2011-06-24 2016-02-10 上海芯豪微电子有限公司 高性能缓存系统和方法

Also Published As

Publication number Publication date
WO2015024532A1 (fr) 2015-02-26
CN104424132A (zh) 2015-03-18
CN104424132B (zh) 2019-12-13

Similar Documents

Publication Publication Date Title
EP3298493B1 (fr) Procédé et appareil de compression d'étiquettes de cache
JP4437001B2 (ja) 変換索引バッファのフラッシュフィルタ
US6678815B1 (en) Apparatus and method for reducing power consumption due to cache and TLB accesses in a processor front-end
WO2014000624A1 (fr) Système et procédé de mise en cache d'instructions à haute performance
KR20180039537A (ko) 조건부 분기 예측기 인덱스를 생성하기 위해 명령어 블록 인출 어드레스 및 분기 패턴의 해싱에서 다수의 바이트 오프셋을 사용하는 분기 예측기
JPH04232551A (ja) 多重仮想アドレス変換方法及び装置
WO2014079389A1 (fr) Procédé et système de traitement de branchement
JPH07200399A (ja) マイクロプロセッサ、およびマイクロプロセッサにおいてメモリにアクセスするための方法
WO2013000400A1 (fr) Procédé et système de traitement de branchement
WO2015024532A1 (fr) Système et procédé destinés à la mise en cache d'instruction de haute performance
JP2006172499A (ja) アドレス変換装置
JPH03141443A (ja) データ格納方法及びマルチ・ウェイ・セット・アソシアチブ・キャッシュ記憶装置
JP2004062280A (ja) 半導体集積回路
WO2014121737A1 (fr) Procédé et système de traitement d'instructions
US10275358B2 (en) High-performance instruction cache system and method
JP3449487B2 (ja) 変換索引緩衝機構
TW201638774A (zh) 一種基於指令和資料推送的處理器系統和方法
WO2007099598A1 (fr) Processeur ayant une fonction de pre-extraction
WO2013071868A1 (fr) Système et procédé de cache à faible taux d'échec et faible pénalité d'échec
WO2015070771A1 (fr) Système et procédé de mise en antémémoire de données
US5467460A (en) M&A for minimizing data transfer to main memory from a writeback cache during a cache miss
WO2018199646A1 (fr) Dispositif de mémoire accessible sur la base de l'emplacement des données et système électronique comprenant ledit dispositif
WO2014000626A1 (fr) Système et procédé de mise en cache de données à haute performance
JP2002215457A (ja) メモリシステム
WO2016169518A1 (fr) Méthode et système de processeur à base d'instruction et de données poussées

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14837748

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 14913837

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 14837748

Country of ref document: EP

Kind code of ref document: A1