WO2015096688A1 - Système et procédé de mise en mémoire cache - Google Patents

Système et procédé de mise en mémoire cache Download PDF

Info

Publication number
WO2015096688A1
WO2015096688A1 PCT/CN2014/094603 CN2014094603W WO2015096688A1 WO 2015096688 A1 WO2015096688 A1 WO 2015096688A1 CN 2014094603 W CN2014094603 W CN 2014094603W WO 2015096688 A1 WO2015096688 A1 WO 2015096688A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
branch
cache
address
read pointer
Prior art date
Application number
PCT/CN2014/094603
Other languages
English (en)
Chinese (zh)
Inventor
林正浩
Original Assignee
上海芯豪微电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海芯豪微电子有限公司 filed Critical 上海芯豪微电子有限公司
Publication of WO2015096688A1 publication Critical patent/WO2015096688A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks

Definitions

  • the invention relates to the field of computers, communications and integrated circuits.
  • the role of the cache is to copy part of the lower-level memory in it, so that the content can be quickly accessed by higher-level memory or processor core to ensure the continuous operation of the pipeline.
  • the addressing of the current cache is based on the following method: the index in the address tag is used to address the tag in the tag memory to match the tag segment in the address; the index segment in the address is used to address the read buffer together with the segment in the block.
  • the content read from the cache is valid, called a cache hit. Otherwise, if the tag read from the tag memory is not the same as the tag segment in the address, it is called a cache miss, and the content read from the cache is invalid.
  • the above operations are performed in parallel for each way group to detect which way group cache hits. The read content corresponding to the hit path group is valid content. If all the way groups are missing, all readings are invalid. After the cache is missing, the cache control logic fills the contents of the low-level storage medium into the cache. .
  • Cache misses can be divided into three categories: mandatory missing, missing conflicts, and missing capacity.
  • forced deletion is inevitable.
  • existing prefetch operations can be costly.
  • the multiplexed associative cache can reduce the lack of collisions, it is subject to power consumption and speed limitations (eg, because the multiplexed associative cache structure requires that all path groups are addressed by the same index and the tags are simultaneously read and Comparison), the number of road groups is difficult to exceed a certain number .
  • Modern cache systems are typically composed of multi-level caches connected by multiplexes.
  • New cache structures such as victim cache, trace cache, and prefetch, are based on the basic cache structure described above and improve the above structure.
  • the current architecture especially the lack of multiple caches, has become the most serious bottleneck restricting the performance of modern processors.
  • the method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.
  • the present invention provides a caching method for pushing an instruction from a cache to a processor core, and determining a subsequent push instruction according to the processor core executing feedback information of the instruction; when the instruction executed by the processor core is a direct branch instruction or a non-branch When the instruction is issued, the feedback information does not include the instruction address.
  • the cache provides a first read pointer to address the cache to read a corresponding instruction for execution by the processor core; and update the feedback information generated by the processor core to execute the instruction, The value of the first read pointer.
  • the instruction of the level 1 cache filled in the cache is reviewed, and corresponding instruction information is extracted; and the branch target of all branch instructions in the level 1 cache is obtained according to the instruction information.
  • the instruction is pre-stored in the L2 cache in the cache; when the processor core executes the branch instruction, the instruction to be executed subsequently is stored in the L2 cache at least if the branch transfer occurs; if the branch transfer occurs, the The read pointer is updated to the branch target address value of the branch instruction; if the branch transfer does not occur, the first read pointer is updated to the addressed address value of the next instruction executed sequentially by the branch instruction.
  • the branch target instruction of the branch instruction to be executed by the processor core is filled from the second level cache to the level 1 cache in advance, so that the processor core executes the branch instruction Whenever a branch transfer occurs, the instructions that will be executed subsequently are already stored in the level 1 cache.
  • the instruction filled in the level 1 cache is reviewed, and corresponding instruction information is extracted; the first read pointer determines how to update according to the instruction information rather than the function of the instruction itself.
  • the execution result of the conditional branch instruction is checked according to the processor: if the branch transfer occurs, The first read pointer is updated to the branch target addressing address value of the conditional branch instruction; if the branch transfer does not occur, the first read pointer is updated to the branch target addressing address value of the unconditional branch instruction; The core does not need to execute the unconditional branch instruction in a single clock cycle.
  • the method buffering a value of the first read pointer, and addressing the first level cache by the buffered first read pointer value to read the corresponding instruction for execution by the processor core;
  • the read pointer points to the branch instruction in advance. If the branch target instruction of the branch instruction is not stored in the level 1 cache, the branch target instruction is filled from the level 2 cache to the level 1 cache, so that the processor core executes the branch instruction.
  • the instruction to be executed subsequently is already stored in the level 1 cache, regardless of whether a branch transfer occurs.
  • a second read pointer is provided; the second read pointer points in advance to a branch instruction after the first read pointer, and if the branch target instruction of the branch instruction is not stored in the level 1 cache, Then, the branch target instruction is filled from the L2 cache to the L1 cache, so that when the processor core executes the branch instruction, the instruction to be executed subsequently is already stored in the L1 cache regardless of whether the branch transfer occurs.
  • one of the next instruction and the branch target instruction is executed as a subsequent instruction guess execution according to the branch prediction selection order, and another addressing address is saved. If the branch transfer result is consistent with the branch prediction, the subsequent instruction is continued; if the branch transfer result is inconsistent with the branch prediction, the pipeline is cleared and re-executed from the instruction corresponding to the saved addressed address.
  • the instruction information includes an instruction type and a branch target addressing address of the branch instruction; each branch instruction corresponds to a branch target addressing address; each instruction corresponds to one instruction type,
  • the instruction type is one or more bits, and can be further divided into basic type information and a branch instruction type; wherein the basic type information distinguishes between the branch instruction and the non-branch instruction, and each instruction corresponds to one basic type information; the branch instruction The type further distinguishes the branch instruction, and each branch instruction corresponds to one branch type information; the branch instruction type further includes: a conditional branch instruction and an unconditional branch instruction.
  • the instruction type of the corresponding instruction is found according to the first read pointer; and the branch target addressing address of the first branch instruction after the instruction is found according to the second read pointer.
  • the corresponding second read pointer value is obtained by mapping according to the first read pointer value; the second read pointer value is equal to the number of branch instructions before the instruction pointed by the first read pointer.
  • the processor core executes the branch instruction
  • the branch transfer occurs, the first read pointer is updated to the branch target addressing address value pointed to by the second read pointer; if the branch transfer does not occur, The first read pointer is updated to the addressed address value of the next instruction that is executed sequentially by the branch instruction.
  • the processor core includes two front-end pipelines and a back-end pipeline; the method further provides: a first instruction read buffer for storing the current instruction block; and a third read pointer pair.
  • the level 1 cache is addressed to read the corresponding instruction for execution by the target front end pipeline of the processor core; the first read pointer addresses the first instruction read buffer to read the corresponding instruction for execution of the sequential front end pipeline of the processor core .
  • the branch transfer of the branch instruction does not occur: updating the third read pointer value to the branch target addressing address of the next branch instruction pointed by the first read pointer, so that the third read pointer points The branch target instruction block in the level 1 cache, and reading the corresponding instruction for the target front end pipeline of the processor core to execute; the first read pointer continues to be updated, and the corresponding instruction is read from the first instruction read buffer for the processor core sequence
  • the front-end pipeline executes; if the branch transfer of the branch instruction succeeds: the instruction block pointed to by the third read pointer is filled from the first-level cache into the first instruction read buffer, and the first read pointer value is updated to the third read pointer.
  • the subsequent value; the first read pointer is updated from the value, and the corresponding instruction is read from the first instruction read buffer for execution by the sequential front-end pipeline of the processor core.
  • a second instruction read buffer is used to store the target instruction block; and a third read pointer addresses the second instruction read buffer to read the corresponding instruction for the target front end of the processor core.
  • the pipeline executes; a second read pointer; the second read pointer points in advance to a branch instruction after the first read pointer, and if the branch target instruction of the branch instruction is not yet stored in the level 1 cache, the branch target instruction is from the second
  • the level buffer is filled into the level 1 cache, so that when the processor core executes the branch instruction, whether or not the branch branch occurs, the subsequent instructions to be executed are already stored in the level 1 cache; a fourth read pointer; the fourth read The pointer points in advance to the branch instruction after the third read pointer.
  • the branch target instruction of the branch instruction is filled from the level 2 cache to the level 1 cache, so that the processor core executes to When the branch instruction is executed, the instruction to be executed subsequently is already stored in the level 1 cache, regardless of whether the branch transfer occurs; when the second read pointer points in advance to the first When the first branch instruction after the pointer is updated, the third read pointer value is updated to the branch target addressing address of the branch instruction pointed to by the first read pointer, so that the third read pointer points to the branch target instruction block in the level 1 cache, and The corresponding instruction is read out for execution by the target front-end pipeline of the processor core.
  • the cache provides a first read pointer to address a level one cache in the cache to read a corresponding plurality of instructions and send the instructions to the processor core, and to the plurality of lines
  • the instruction performs correlation detection; according to the result of the correlation detection, the control processor core executes part or all of the plurality of instructions; the feedback information generated by the processor core to execute the instruction, and the result of the correlation detection Updating the value of the first read pointer; the method further comprising: providing a second read pointer; the second read pointer is directed to the branch instruction after the first read pointer, if the branch target instruction of the branch instruction is not yet stored in In the L1 cache, the branch target instruction is filled from the L2 cache in the cache into the L1 cache, so that when the processor core executes the branch instruction, whether the branch transfer occurs or not, the subsequent instruction to be executed Already stored in the Level 1 cache.
  • the instruction information includes an instruction type, a branch target addressing address of the branch instruction, and a data addressing address and a data step size of the data access instruction; each branch instruction corresponds to a branch target addressing Address; each data access instruction corresponds to one data addressing address and one data step; each instruction corresponds to one instruction type, and the instruction type is one or more bits, which can be further divided into basic type information, branches The instruction type and the data access instruction type; wherein the basic type information distinguishes the branch instruction, the data access instruction, and other instructions, each instruction corresponds to a basic type information; the branch instruction type further distinguishes the branch instruction, and each branch instruction Corresponding to a branch type information; the branch instruction type further includes: a conditional branch instruction, an unconditional branch instruction; the data access instruction type further includes address valid information indicating whether the corresponding data addressing address is valid, and indicating the corresponding The step size of the data step is valid Information.
  • the address valid information and the step valid information in the data access instruction type collectively represent three states: State 1: acquire data according to the data address provided by the processor core, and record the data address; State 2: acquiring data according to the data address provided by the processor core, calculating and recording the data step according to the data address and the data address of the state, and calculating and recording the next data according to the data address and the data step address; State 3: Push data to the processor core according to the next data address recorded in the second state, and calculate and record the next data address according to the data address and the data step.
  • the previous data block or the next data block of the sequential address of the data block where the data is located is already stored in the first level cache, the previous data block or the next data block is selected.
  • the address information in the L1 cache is recorded together with the data block in which the data is located, so that when the data is accessed, the address address of the previous data block or the subsequent data block in the L1 cache can be simultaneously read.
  • the address of the next data when the data access instruction is executed next time is obtained, and when the address of the next data is located at the current data address
  • the address address of the previous data block or the subsequent data block is read out while the data is being read, thereby directly obtaining the next data at the first level.
  • the addressed address in the cache is not limited to the address of the current data corresponding to the current data access instruction.
  • a second read pointer is provided; the second read pointer starts to move from the instruction pointed by the first read pointer to the first branch instruction; if the branch target of the branch instruction If the instruction has not been stored in the L1 cache, the branch target instruction is filled from the L2 cache to the L1 cache, so that when the processor core executes the branch instruction, the instruction to be executed subsequently is stored regardless of whether the branch transfer occurs.
  • level 1 cache For each of the data access instructions that the second read pointer passes during the move, if the data that the data access instruction needs to access is not yet stored in the level 1 cache, the data is from the second
  • the level cache is populated into the level 1 cache so that the processor core can access the data directly from the level 1 cache when executing the data access instruction.
  • a fifth read pointer is provided; the fifth read pointer points in advance to a data access instruction after the first read pointer, and if the data access instruction needs to access data not yet stored in the level 1 cache The data is filled from the secondary cache to the primary cache, so that the processor core can directly access the data from the primary cache when executing the data access instruction.
  • a fifth read pointer and a temporary register are provided; the fifth read pointer moves in advance to point to each data access instruction after the first read pointer; if the data access instruction needs to be accessed The data is not stored in the level 1 cache, then the data is filled from the level 2 cache to the level 1 cache and stored in the register; otherwise, the data is read directly from the level 1 cache and stored In the register, the fifth read pointer can continue to move to the next data access instruction; the temporarily stored data is sequentially pushed to the processor core for accessing the data access instruction.
  • the instruction information of each branch instruction is sequentially recorded in an information table in the order of the instruction address; the instruction information includes address information of the branch target instruction of the branch instruction in the cache; Storing a mapping relationship from an instruction address to a corresponding address of the information table in a mapping table; the information in the mapping table is in one-to-one correspondence with the instructions in the cache, and according to the mapping relationship, the instruction address may be Converted to the address of the first branch instruction starting from the instruction address in the information table.
  • a first read pointer is provided to address the cache to read a corresponding instruction for execution by the processor core; and the first read pointer value is converted into a corresponding information table according to the mapping relationship.
  • the value is used as a corresponding value to be updated; determining, according to the mapping table information pointed to by the first read pointer, whether to update the second read pointer to the corresponding value to be updated; determining whether to update the first read pointer according to the feedback information Corresponding to the value to be updated .
  • the present invention also provides a cache system, comprising: a processor core for executing instructions; a cache, wherein the cache stores at least instructions to be executed by the processor core; and a cache controller for executing instructions according to the processor core
  • the feedback information is addressed to the cache, and the control cache pushes subsequent instructions to the processor core; when the instruction executed by the processor core is a direct branch instruction or a non-branch instruction, the feedback information does not include the instruction address.
  • the cache controller at least includes: a tracker; the tracker outputs a first read pointer to address the cache to read corresponding instructions for execution by the processor core, And updating the value of the first read pointer according to the feedback information generated by the processor core execution instruction.
  • the level 1 cache stores at least an instruction to be executed by the processor core;
  • the level 2 cache is used to store all instructions in the level 1 cache, and the level 1 cache a branch target instruction of all branch instructions;
  • the cache controller further includes an active table, a block address mapping module, a track table, and a scanner; wherein: the active table has an entry corresponding to the second level cache instruction block, and is configured to store two The address information of the instruction block in the level cache;
  • the block address mapping module is configured to store the correspondence between the level 1 cache and the level 2 cache instruction address;
  • the track table the track point corresponding to the level 1 cache instruction, is used to store the level 1 cache.
  • the instruction information includes instruction type and branch target instruction position information of the branch instruction;
  • the scanner is configured to review the instruction filled in the first level cache, extract the corresponding instruction information, and calculate the branch of the branch instruction The target instruction address, and the calculated branch target instruction address is matched in the active table and the block address mapping module; if the matching is unsuccessful, the At least one instruction including the branch target instruction is filled from the lower level memory into the second level cache, and the corresponding branch target instruction position information is stored in the track table; if the matching is successful, the branch target instruction position is directly The information is stored in the track table; the tracker outputs a first read pointer to address the first level cache to read the corresponding instruction for execution by the processor core, and reads the instruction information from the track table; The tracker further updates the value of the first read pointer according to the feedback information generated by the processor core execution instruction and the instruction information; if the branch transfer occurs, the first read pointer is updated to the branch target of the branch instruction The address value is addressed; if the branch transfer does not occur
  • the first read pointer determines how to update according to the instruction information rather than the function of the instruction itself.
  • the instruction information stored in the track point pointed by the first read pointer and the subsequent track point is simultaneously read from the track table.
  • the execution result of the conditional branch instruction is checked according to the processor: if the branch transfer occurs, The first read pointer is updated to the branch target addressing address value of the conditional branch instruction; if the branch transfer does not occur, the first read pointer is updated to the branch target addressing address value of the unconditional branch instruction; The core does not need to execute the unconditional branch instruction in a single clock cycle.
  • a buffer is further included, the buffer is configured to store a value of the first read pointer; and the first read pointer value of the buffer output is addressed to the first level cache to read the corresponding
  • the instruction is executed by the processor core; the first read pointer of the tracker points to the branch instruction in advance, and if the branch target instruction of the branch instruction is not stored in the level 1 cache, the branch target instruction is filled from the second level cache to the first level Cache, when the processor core executes to the branch instruction, whether the branch transfer occurs or not, the subsequent instructions to be executed are already stored in the level 1 cache; when the processor core executes the branch instruction, if the branch transfer occurs, the tracker The first read pointer is updated to the branch target addressing address value; if the branch transfer does not occur, the first read pointer of the tracker is updated to the addressed address value of the next instruction executed sequentially by the branch instruction.
  • a sub-tracker (slave) is further included Tracker); the secondary tracker outputs a second read pointer, pointing to a branch instruction after the first read pointer in advance, and if the branch target instruction of the branch instruction is not stored in the level 1 cache, the branch target instruction is from the second
  • the level cache is filled into the level 1 cache so that when the processor core executes the branch instruction, the instruction to be executed subsequently is already stored in the level 1 cache, regardless of whether the branch branch occurs.
  • the tracker further includes a register for storing an address address sequentially executing the next instruction and the branch target instruction; when the processor core executes the branch instruction, Performing one of the next instruction and the branch target instruction as a subsequent instruction guess execution according to the branch prediction selection order, and storing another address address in the register; if the branch transfer result is consistent with the branch prediction, continuing to perform subsequent If the branch transfer result is inconsistent with the branch prediction, the pipeline is cleared and re-executed from the instruction corresponding to the addressed address saved in the register.
  • an end track point is added after the last track point of each track in the track table;
  • the instruction type of the end track point is an unconditional branch instruction, and the branch target addressing address
  • the addressing address of the first track point of the next track is sequentially executed; when the first read pointer points to the end track point, the first level buffer outputs an empty instruction; when the track point before the end of the track point is not a branch point, the The instruction type of the end track point and the branch target addressing address are used as the instruction type of the track point and the branch target addressing address.
  • the track table further includes: an instruction type table, configured to store basic type information corresponding to the instruction; and the basic type information distinguishes the branch instruction from the non-branch instruction; each instruction is Corresponding to an entry of the instruction type table, and the content of the entry is one or more bits; the target address table is used to store the branch instruction type and the branch target addressing address corresponding to the branch instruction; each branch instruction corresponds to the target address table An entry of the branch; the branch instruction type includes: a conditional branch instruction, an unconditional branch instruction.
  • the basic type information of the corresponding instruction is found from the instruction type table according to the first read pointer; the basic type information is sent to the tracker; and the second read pointer is from the target address table.
  • the branch instruction type and the branch target addressing address of the first branch instruction after the instruction are found; the branch instruction type is sent to the tracker, and the branch target addressing address is sent to the secondary tracker.
  • an offset address mapping module is further configured to map the first read pointer value to a corresponding target address table column number; the target address table column number is sent to the secondary tracking If the processor core executes the branch instruction and the branch transfer occurs, the second read pointer of the secondary tracker is updated to the column number sent by the offset address mapping module; if the processor core executes the branch instruction and the branch transfer does not occur, The second read pointer value of the secondary tracker is incremented by one to point to the next entry in the target address table; the offset address mapping module includes: a decoder for generating a mask according to the first read pointer value The code value is such that the mask value corresponding to the instruction pointed by the first read pointer and the subsequent instruction is '0', and the other mask values are all '1'; the masker is used to generate the decoder The mask value is ANDed with the basic type information in the instruction type table to obtain a control word; a selector array; in the selector array, each column selector selects according
  • the processor core includes two front-end pipelines and a back-end pipeline; the system further includes: a first instruction read buffer for storing the current instruction block; and a second tracker Outputting a third read pointer to the level one cache to read the corresponding instruction for execution by the target front end pipeline of the processor core; the first read pointer addressing the first instruction read buffer to read the corresponding instruction for processing The sequential front-end pipeline of the core is executed.
  • the branch transfer of the branch instruction does not occur: updating the third read pointer value to the branch target addressing address of the next branch instruction pointed by the first read pointer, so that the third read pointer points The branch target instruction block in the level 1 cache, and reading the corresponding instruction for the target front end pipeline of the processor core to execute; the first read pointer continues to be updated, and the corresponding instruction is read from the first instruction read buffer for the processor core sequence
  • the front-end pipeline executes; if the branch transfer of the branch instruction succeeds: the instruction block pointed to by the third read pointer is filled from the first-level cache into the first instruction read buffer, and the first read pointer value is updated to the third read pointer.
  • the subsequent value; the first read pointer is updated from the value, and the corresponding instruction is read from the first instruction read buffer for execution by the sequential front-end pipeline of the processor core.
  • the method further includes: a second instruction read buffer for storing the target instruction block; and a third read pointer addressing the second instruction read buffer to read the corresponding instruction for the processor core
  • the target front-end pipeline executes; a secondary tracker outputs a second read pointer; the second read pointer points in advance to a branch instruction after the first read pointer, if the branch target instruction of the branch instruction is not yet stored in the level 1 cache , the branch target instruction is filled from the second level cache to the level 1 cache, so that when the processor core executes the branch instruction, whether the branch transfer occurs or not, the subsequent instructions to be executed are already stored in the level 1 cache; a second tracker, outputting a fourth read pointer; the fourth read pointer is directed to a branch instruction after the third read pointer, and if the branch target instruction of the branch instruction is not stored in the level 1 cache, the branch target is The instruction is filled from the second level cache to the level 1 cache, so that when the processor core executes the branch instruction, whether or not the branch instruction, whether
  • the third read pointer value is updated to the branch target addressing address of the branch instruction pointed to by the first read pointer, such that The third read pointer points to the branch target instruction block in the level 1 cache, and reads the corresponding instruction for execution by the target front end pipeline of the processor core.
  • the method further includes: a tracker, outputting the first read pointer to address the first level cache in the cache to read the corresponding multiple instructions and pushing to the processor core, a correlation a detection module, performing correlation detection on the plurality of instructions, and controlling, according to a result of the correlation detection, the processor core to execute part or all of the plurality of instructions; the tracker is executed according to the processor core Deriving the feedback information generated by the instruction, and the result of the correlation detection, updating the value of the first read pointer; the system further comprising: a sub-tracker, outputting the second read pointer to the branch after the first read pointer in advance An instruction, if the branch target instruction of the branch instruction is not stored in the level 1 cache, filling the branch target instruction from the level 2 cache in the cache into the level 1 cache, so that the processor core executes the branch instruction The instruction to be executed subsequently is already stored in the level 1 cache, regardless of whether a branch transfer occurs.
  • the instruction information in the track table further includes a data addressing address and a data step size of the data access instruction; each data access instruction corresponds to a data addressing address and a data step size; Corresponding data may be directly found in the L1 cache according to the data addressing address stored in the entry pointed to by the fifth read pointer read from the track table; the instruction information including the instruction type further includes: The branch instruction type, the data access instruction type, and other instruction types; the data access instruction type further includes information on whether the corresponding data addressing address is valid and whether the data step size is valid.
  • an adjacent data block address memory is further included; the row of the adjacent data block address memory is in one-to-one correspondence with the data block in the first level cache: when one data in the first level cache When the previous data block or the next data block of the block is already stored in the L1 cache, the previous data block or the next data block is stored in the corresponding row of the adjacent data block address memory.
  • the system further includes a data engine; the data engine includes at least one adder for adding the data step size to the address of the current data corresponding to the current data access instruction The address of the next data when the data access instruction is executed; when the address of the next data is located in the previous data block or the next data block of the current data address, the address is addressed from the first level according to the data. Reading the corresponding data from the buffer while reading the addressed address of the previous data block or the subsequent data block in the corresponding level in the first level cache from the adjacent data block address memory, thereby directly obtaining The next address data in an address cache.
  • a sub-tracker is further included; the sub-tracker outputs a second read pointer, and the second read pointer starts to move from the instruction pointed by the first read pointer to advance a first branch instruction; if the branch target instruction of the branch instruction is not yet stored in the level 1 cache, the branch target instruction is filled from the level 2 cache to the level 1 cache, so that when the processor core executes the branch instruction, Whether a branch transfer occurs, and an instruction to be executed subsequently is already stored in the first level cache; for each of the data access instructions that the second read pointer passes during the move, if the data access instruction needs to access the data yet Stored in the level 1 cache, the data is filled from the level 2 cache to the level 1 cache, so that the processor core can directly access the data from the level 1 cache when executing the data access instruction.
  • the system further includes a data tracker that outputs a fifth read pointer; the fifth read pointer points in advance to a data access instruction after the first read pointer, if the data access instruction needs to be accessed
  • the data is not yet stored in the level 1 cache, and the data is filled from the level 2 cache to the level 1 cache so that the processor core can access the data directly from the level 1 cache when executing the data access instruction.
  • a fifth read pointer and a temporary register are provided; the fifth read pointer moves in advance to point to each data access instruction after the first read pointer; if the data access instruction needs to be accessed The data is not stored in the level 1 cache, then the data is filled from the level 2 cache to the level 1 cache and stored in the register; otherwise, the data is read directly from the level 1 cache and stored In the register, the fifth read pointer can continue to move to the next data access instruction; the temporarily stored data is sequentially pushed to the processor core for accessing the data access instruction.
  • an information table is configured to sequentially store instruction information of each branch instruction in an order of an instruction address; the instruction information includes address information of a branch target instruction of the branch instruction in a cache; a mapping table, configured to store a mapping relationship from an instruction address to a corresponding address of the information table; the information in the mapping table is in one-to-one correspondence with the instructions in the cache, and according to the mapping relationship, the instruction address may be Converting to an address in the information table of the first branch instruction starting from the instruction address; a tracker for providing a first read pointer addressing the cache to read the corresponding instruction for the processor core Executing; the first read pointer value sent by the mapping table receiving the tracker is converted into a corresponding information table address according to the mapping relationship; and a sub-tracking device for providing a second read pointer pair to the information table Addressing to read instruction information of a corresponding branch instruction; the instruction information includes branch target instruction address information corresponding to the branch instruction; when a branch transfer occurs, the following The branch target
  • the tracker further includes: a register for storing the first read pointer value; and an incrementer for increasing the value of the first read pointer stored in the register One to obtain an incremented value; a selector; when a branch transfer occurs, the selector selects the branch target instruction address information as a first read pointer to be updated, otherwise selects the first read pointer value to increase a subsequent value as a first read pointer to be updated; according to the feedback information, the tracker determines whether to update the first read pointer to the corresponding first read pointer to be updated; the secondary tracker further Included: a register for storing a second read pointer value; an incrementer for incrementing a second read pointer value stored in the register to obtain an incremented value; a selector; when branching When the selector occurs, the selector selects the information table address obtained by the mapping conversion as the second read pointer to be updated, and otherwise selects the value of the second read pointer value to be the second read pointer to be updated; First reading
  • the invention also proposes a method for organizing instruction block information in a cache, the method comprising a compression method; the compression method stores only instruction information of a specific type or a specific type of instruction by classifying the instruction In the information table, a mapping table is generated based on the classification result.
  • a decompression method is further included; the decompression method converts an instruction address into the instruction or the instruction information of the instruction according to the content in the mapping table in the information table. The address in .
  • the instruction address sequence sequentially records at least one type of instruction information in an information table.
  • the instruction information corresponding to the instruction address is converted in the information table according to the content in the mapping table. address.
  • the classification information in the mapping table is counted starting from '0' according to the instruction address: adding a type corresponding to each instruction information stored in the information table Or shifting one bit until the type corresponding to the instruction address is reached; the count value or shift bit value obtained at this time is the address of the instruction information corresponding to the instruction address in the information table.
  • the system and method of the present invention can provide a basic solution for the cache structure used by digital systems. Unlike the conventional cache system, which only populates after the cache is missing, the system and method of the present invention fills the instruction cache before the processor executes an instruction, which can avoid or sufficiently hide the cache miss.
  • the invention according to the program execution flow and the processor core execution instruction feedback does not require the processor core to provide the instruction address, and directly provides instructions to the processor core from the higher level cache, which reduces the pipeline depth and improves the pipeline efficiency. Especially in the case of branch prediction errors, the wasteful pipeline cycle can be reduced.
  • 1A is an embodiment of a cache structure of the present invention
  • 1B is another embodiment of the cache structure of the present invention.
  • 1C is an embodiment of the present invention supporting guess execution
  • Figure 5 is an embodiment of the instruction type table and the target address table of the present invention.
  • Figure 7 is another embodiment of the cache system of the present invention.
  • Figure 8 is another embodiment of the tracker of the present invention.
  • Figure 10 is an embodiment of the contents of the track table of the present invention.
  • FIG. 11 is another embodiment of a cache system for avoiding branch loss according to the present invention.
  • Figure 12 is another embodiment of the cache system of the present invention.
  • Figure 13 is an embodiment of a track table entry format
  • Figure 14 is another embodiment of an instruction type table
  • Figure 15A is an embodiment of a format of an instruction type table and a branch target address table
  • Figure 15B is an embodiment of an instruction type table, a branch target table, and a data address table format
  • Figure 16 is another embodiment of the cache system of the present invention.
  • Figure 17 is an embodiment of a push cache system that supports multiple transmissions of instructions.
  • Figure 4 shows a preferred embodiment of the invention.
  • Instruction Address refers to the memory address of the instruction in the main memory, that is, the instruction can be found in the main memory according to the address.
  • the virtual address is assumed to the physical address, and the method of the present invention is also applicable for the case where address mapping is required.
  • the current instruction may refer to an instruction currently being executed or fetched by the processor core; the current instruction block may refer to an instruction block containing an instruction currently being executed by the processor.
  • a branch instruction or a branch point refers to any appropriate process that can cause the processor core to change the execution flow (Execution Flow) ) (eg, instructions that do not execute instructions or micro-ops in sequence).
  • the branch instruction address refers to the instruction address of the branch instruction itself, and the address is composed of the instruction block address and the instruction offset address.
  • the branch target instruction refers to the target instruction that the branch instruction is caused by the branch instruction, and the branch target instruction address refers to the instruction address of the branch target instruction.
  • FIG. 1A is an embodiment of a cache structure according to the present invention.
  • the processor system is comprised of processor cores 102. , level 1 cache 104, scanner 106, level 2 cache 108, track table 110, replacement module 124, tracker 120, active table 130
  • the block address mapping module 134 and the selectors 132, 140, 142, 146, 148, 150 are constructed.
  • Figure 1A Also not shown is a controller that receives the slave processor core 102, the block address mapping module 134, the scanner 106, the active list 130, the track table 110, and the replacement module 124. The output controls the operation of each functional module.
  • the first address (BNX) and the second address (BNY) can be used. ) to indicate the location information of the instruction in the level 1 cache or the level 2 cache.
  • the first address and the second address may be an address address of the level 1 cache, or may be an address address of the level 2 cache.
  • BN1X can be used to indicate the first block number of the instruction block where the instruction is located (that is, to point to the corresponding one-level instruction block in the level 1 cache), and use BN1Y. Represents the intra-block offset of the instruction (ie, the relative position of the instruction in the level one instruction block).
  • BN2X can be used.
  • BN1 can be used to represent BN1X and BN1Y
  • BN2 can be used to represent BN2X and BN2Y. Since the instructions in the L1 cache are stored in the L2 cache in the present invention, the instructions stored in the L1 cache can be represented by BN1 or BN2.
  • the first address and the second address may be used to represent the track point on the track table 110. Location information in .
  • the instruction type of the branch point can also include the branch target addressing address represented by BN1 (that is, the direct branch instruction whose branch destination is BN1) or BN2 (that is, the branch destination is Information about the direct branch instruction of BN2.
  • a two-level instruction block may contain a plurality of first-level instruction blocks, such that the intra-block offsets in the second-level instruction block according to the instructions ( BN2Y) can directly get its corresponding intra-block offset (BN1Y).
  • the secondary instruction block contains two primary instruction blocks
  • the BN2Y of the instruction in the secondary instruction block More than one bit (bit) of BN1Y in the instruction block of the instruction level.
  • MSB most significant bit That means that the instruction is located in which of the two first-level instruction blocks corresponding to the second-level instruction block, and the other bits are equivalent to the BN1Y of the instruction in the first-level instruction block.
  • This can be the case for a second instruction block that contains more than one instruction block.
  • the entries in the active table 130 correspond one-to-one with the storage blocks in the secondary cache 108.
  • Active table 130 Each entry in the table stores a matching pair of a secondary instruction block address and a secondary block number BN2X, indicating that the secondary instruction block corresponding to the instruction block address is stored in the secondary cache 108 Which of the memory blocks is in.
  • matching can be performed in the active table 130 according to a secondary instruction block address, and a BN2X is obtained if the matching is successful; or a BN2X can be obtained according to a BN2X
  • the active table 130 is addressed to read the corresponding secondary instruction block address.
  • Scanner 106 pairs from secondary cache 108 via bus 107 The sent instruction block is reviewed, and the track point information is extracted and filled into corresponding entries of the track table 110, thereby establishing a track of at least one level one instruction block corresponding to the second level instruction block, and the scanner 106 is slave bus. 105 Output The block is filled into the Level 1 cache 104.
  • a replacement BN1X is first generated by the replacement module 124 to point to an available track.
  • the replacement module 124 The available tracks can be determined based on a replacement algorithm such as the LRU algorithm.
  • the scanner 106 calculates the branch destination address of the branch instruction contained in the instruction block.
  • the calculated branch target address is sent to the active table 130 to match the instruction block address stored therein to determine if the branch target has been stored in the secondary cache 108 Medium. If the match is unsuccessful, the instruction block in which the branch target instruction is located has not yet been filled into the L2 cache 108, then the active block is also populated in the L2 cache 104 from the lower layer memory.
  • a matching pair of the corresponding level two instruction block address and the second level block number is established in 130.
  • the scanner 106 is on the slave bus 107.
  • Each instruction is reviewed and some information is extracted, such as the instruction type, the instruction source address, and the branch increment of the branch instruction ( Branch Offset ), and based on this information, calculate the branch target address.
  • the branch target address can be obtained by adding the block address of the instruction block in which the instruction is located, the offset of the instruction in the instruction block, and the branch increment.
  • the instruction block address may be from an active table Read in 130 and sent directly to the adder in scanner 106.
  • a register for storing the current instruction block address may also be added to the scanner 106 such that the active table 130 It is not necessary to send the instruction block address in real time.
  • the branch target address of the direct branch instruction is generated by the scanner 106
  • the branch target address of the indirect branch instruction is generated by the processor core 102.
  • Block Address Mapping Module 134 Each row corresponding to each secondary cache block has a plurality of entries, and each entry stores a primary block number (BN1X) of the primary cache block corresponding to the corresponding portion of the secondary cache block. This way, for a BN2 A row in the block address mapping module 134 can be found according to BN2X among them, and the corresponding entry can be found in the row by using the high bit of BN2Y.
  • each row in the block address mapping module 134 corresponds to two BN1Xs, and according to BN2Y in BN2.
  • the highest bit of the BN1X entry corresponding to the BN2 can be found. This can be the case for a second instruction block that contains more than one instruction block.
  • the track table 110 has a plurality of track points (track point) ).
  • a track point is an entry in the track table corresponding to at least one instruction, and the address corresponding to each track point from left to right in the track is incremented. In addition, it can be on the track table 110 Each row (each track) is additionally added with an entry (end track point) for storing the position of the next track that is directed to the sequence.
  • the track point content of track table 110 can include: format ( TYPE ), the secondary block number (BN2X) and the secondary block offset (BN2Y).
  • the track point content of the track table 110 can also include the format ( TYPE ), the first block number (BN1X) ) and one-level block offset (BN1Y) ).
  • the format contains instruction types such as branch instructions, data access instructions, and other instructions.
  • the branch instruction can be further subdivided, such as an unconditional direct branch instruction, a conditional direct branch instruction, an unconditional indirect branch instruction, and a conditional indirect branch instruction, etc., and the corresponding track point is a branch point.
  • Data access instructions can also be further subdivided, such as data read instructions and data storage instructions, etc., and their corresponding track points are data points.
  • the track point address of the track point itself is related to the instruction address of the instruction represented by the track point (
  • the branch instruction track point contains the track point address of the branch target, and the track point address is related to the branch target instruction address.
  • level 1 cache 104 A plurality of consecutive track points corresponding to a level one instruction block formed by a series of consecutive instructions are referred to as one track.
  • the first level instruction block and the corresponding track are indicated by the same level one block number BN1X.
  • Track table 110 Contains at least one track.
  • the total number of track points in a track can be equal to the total number of entries in a row in track table 110 (you can also add an end track point).
  • the track table 110 It becomes a table that represents a branch instruction by the branch source address corresponding to the track entry address and the branch target address corresponding to the entry of the entry.
  • each line may additionally add a second block number entry for recording the BN2X corresponding to the line track BN1X. .
  • the branch target BN1X in the branch point in the other track table row of the behavior branch target can be converted into the corresponding BN2X, and BN1Y Converted to the corresponding BN2Y so that the line can be written to other lines without causing an error.
  • the possible path of the program run or the possible flow of the program execution flow is recorded in the track table 110, so the tracker 120
  • the program flow can be tracked according to the program flow in the track table 110 and the feedback from the processor core 102. Since the instruction corresponding to the track table entry is stored in the primary cache 104, the primary cache 104 is tracked by the tracker.
  • the output bus 115 of 120 is a read address that follows the program flow followed by the tracker 120 and sends instructions over the bus 103 for execution by the processor core 102.
  • Track table 110 Some of the branch targets are recorded with the secondary cache address BN2, the purpose of which is to store only the instructions that may be used to the primary cache 104 so that the primary cache 104 can have a secondary cache 108. Smaller capacity and faster speed.
  • the BN2 is sent to the block address mapping module 134 and other modules to obtain BN1.
  • the address, or the instruction in the secondary cache is filled into the primary cache 104 by the newly assigned BN1 address; the obtained or assigned BN1 address is also filled back into the entry in the track table 110, the tracker 120
  • the control level 1 cache 104 outputs an instruction to the processor core 102 for execution.
  • the write address corresponding to the write port has two sources, namely the row address selector 140 (BN1X) ) and column address selector 142 (BN1Y).
  • the selector 140 selects the row address BN1X output by the replacement module 124, and the selector 142 selects the scanner 106.
  • the output column address is BN1Y .
  • BN2 is stored in the track point content read by the tracker 120, the BN2 is sent to the block address mapping module 134 to be converted to BN1, which is BN1.
  • the indirect branch target address generated by the processor core 102 is selected by the selector 148 via the bus 155 and then sent to the active table via the bus 149. 130 matches BN1, or matches/allocates to get BN2X and converts to BN1 by block address mapping module 134, etc. as previously described. The BN1 It also needs to be written back to the track point. In both cases, row address selector 140 and column address selector 142 select BN1X and BN1Y on the current read address pointer 115. As the write address of the track table 110.
  • Track Table 110 The write port itself has two sources: buses 125 and 123, via selector 146 After selection, it is written as content.
  • the value on bus 125 is BN1 output by block address mapping module 134, and the value on bus 123 is in the form of a secondary cache address (BN2) Branch destination address.
  • BN2 secondary cache address
  • the scanner 106 while the instruction is filled into the level 1 cache 104 Review and extract the corresponding information. Specifically, if the instruction is a branch instruction, the scanner 106 calculates the branch target address. The block address in the branch target address is selected by the selector 148 via the bus 129. After selection, it is sent to the active table 130 via bus 149 to match. If the matching is successful, the BN2X corresponding to the matching success item is selected by the selector 132 and sent to the block address mapping module via the bus 133. 134, and searching for a corresponding BN1X in the row pointed by the BN2X according to BN2Y in the branch target address.
  • the output is output via bus 125 BN1X and convert the BN2Y to the corresponding BN1Y.
  • the selector 140 selects the BN1X 127 corresponding to the branch instruction generated by the replacement module 124 as the track table 110.
  • the first address in the write address, the selector 142 selects the intra-block offset 119 of the branch instruction output by the scanner 106 in its instruction block as the second address in the write address of the track table 110, and the bus BN1 on 125 is selected by the selector 146 and written with the extracted instruction type as the track point content is written to the track table via the bus 147.
  • the establishment of the track point is completed. At this point, the track point contains BN1.
  • the selector 140 If the valid BN1X corresponding to the BN2 does not exist in the block address mapping module 134, the selector 140 The BN1X 127 corresponding to the branch instruction generated by the replacement module 124 is selected as the first address in the write address of the track table 110, and the selector 142 selects the scanner 106.
  • the output branch instruction has an intra-block offset 119 in its instruction block as the second address in the write address of track table 110, which will be the BN2X and scanner 106 on bus 133.
  • the calculated BN2Y in the branch target address is spliced into BN2 and placed on the bus 123 and selected by the selector 146 to be used together with the extracted instruction type as the track point content via the bus 147. Writing to the track table 110 completes the establishment of the track point. At this point, the track point contains BN2.
  • the medium matching is unsuccessful, indicating that the instruction corresponding to the target address of the branch has not been stored in the secondary cache 108, and the block number of the secondary storage block is allocated according to a replacement algorithm (such as the LRU algorithm). And sending the branch target address to the lower level memory to obtain the corresponding instruction block stored on the bus 109 to the secondary cache 108 by the memory block pointed to by the BN2X.
  • Selector 140 Select by replacement module 124.
  • the generated BN1X 127 corresponding to the branch instruction is the first address in the write address of the track table 110, and the selector 142 selects the scanner 106.
  • the output branch instruction has an intra-block offset 119 in its instruction block as the second address in the write address of the track table 110, directly directly in the BN2X and the intra-block offset address in the branch target address (and BN2Y) is merged into BN2 and placed on the bus 123 and selected by the selector 146 to be written to the track table via the bus 147 together with the extracted instruction type. In the middle, the establishment of the track point is completed. At this point, the track point contains BN2.
  • the secondary cache 108 can be cached to the primary cache 104. Fill in the instructions and create the appropriate tracks.
  • the level 1 cache block address BN1X 127 generated by the replacement module 124 is also directed to the level 1 buffer 104.
  • a write address is provided for writing to the first level memory block corresponding to the track table. If the memory block is a fractional write, the BN1Y address generated by the scanner 106 scanning an instruction block passes through the bus 119.
  • the level 1 buffer 104 is provided to control the instruction to write to the correct address.
  • the tracker 120 is composed of a register 112, an incrementer 114, and a selector 118, which reads the pointer 115. (i.e., the output of the register 112) points to the track point corresponding to the instruction to be executed by the processor core 102 in the track table 110 (i.e., the current instruction), and reads the track point content and sends it to the selector via the bus 117. 118.
  • the read pointer 115 addresses the level one cache 104, and the current instruction is read out via the bus 103 to the processor core 102 for execution.
  • the format of the read pointer 115 is BN1X. BN1Y. Wherein BN1X selects a row in the track table 110 and a corresponding memory block in the level one instruction cache 104, and BN1Y selects an entry in the row and a corresponding instruction in the memory block.
  • Register 112 is controlled by a stepping signal 111 from processor core 102.
  • Step signal 111 is the feedback signal provided by the processor core to the tracker. This signal is always '1' when the processor core is working normally, so that the register 112 in the tracker 120 is updated every clock cycle, so that the read pointer 115 Point to a new entry in the track table and a new instruction in level 1 cache 104 for execution by the processor core.
  • the step signal is '0'. ', causes register 112 to stop updating, tracker 120 and pointer 115 remain unchanged, and level 1 cache 104 suspends providing new instructions to processor core 102.
  • the read pointer 115 points to an entry (track point) in the track table 110, which is read out by the bus 117. If bus The instruction type in the content of the track point read out on 117 is decoded by the controller to show that the instruction is not a branch instruction, and the controller controls the selector 118 to select the BN1X value derived from the register 112 and the source from the incrementer. The additional BN1Y of 114 is sent back to the input of register 112 as a new BN1 output. Register 112 is updated after the effective step signal 111 is controlled, and the pointer 115 is read. The next track point is directed to the right track point on the right track of the same track and read out from the level one cache 104. The next instruction is executed by the processor 103 via the bus 103.
  • the controller controls the selector 118 to select the BN1 as the output to be sent to the register 112, and to update the register 112 when the step signal 111 is active. So that the value of the next period register 112 is updated to the BN1, that is, the read pointer 115 points to the track point corresponding to the branch target instruction and reads the branch target instruction from the level 1 cache 104 via the bus 103. It is executed by the processor core 102. If the step signal 111 is invalid, the value of the register 112 remains unchanged, and continues to wait for the step signal 111 to be valid when it is updated.
  • the controller control selector 118 selects based on the TAKEN signal 113 generated when the processor core executes the branch instruction indicating whether a branch transfer has occurred. At this time, if The value of TAKEN signal 113 is '1', indicating that branch transfer occurs, and BN1 of the track table output is sent back to register 112, when step signal 111 is valid (value is '1 The register 112 is updated so that the value of the next period register 112 is updated to the BN1, that is, the read pointer 115 points to the track point corresponding to the branch target instruction and is buffered from the level one.
  • the branch target instruction is read out for execution by the processor core 102. If the value of the TAKEN signal 113 is '0', indicating that the branch transfer has not occurred, the BN1X output from the register 112 is selected. And the incrementer 114 increments the value of the BN1Y value of the register 112 as an output back to the register 112, and updates the register 112 when the step signal 111 is valid (the value is '1').
  • the value of the next period register 112 is incremented by one, that is, the read pointer 115 points to the next sequential track point and reads the corresponding instruction from the level 1 cache 104 via the bus 103 for the processor core 102. carried out. If the step signal 111 is invalid, the value of the register 112 remains unchanged, and continues to wait for the step signal 111 to be valid when it is updated.
  • the BN2 is sent via the bus 117 to the block address mapping module 134 for matching conversion as previously described.
  • the block address mapping module 134 If there is a valid BN1X corresponding to the BN2, the output BN1X and the corresponding BN1Y are merged into BN1, and the bus 125 is written back to the branch point to replace the original stored BN2. . If there is no valid BN1X corresponding to the BN2, then a replacement module 124 generates a BN1X as described above, in the track table 110 (and the level 1 cache 104) via the bus 127.
  • the filled instruction is reviewed by the scanner 106 and the track point information is extracted and filled into the track in the track table 110 pointed to by the BN1X, and the generated BN1X and The correspondence between BN2X is stored in the block address mapping module 134. At the same time, the instructions corresponding to the track are stored in the level one cache 104.
  • Replacement module 124 produces BN1X, and based on BN2Y removes the upper bits (the level 2 cache sub-cache block number, and the capacity of each level 2 cache sub-cache block is equivalent to a level 1 cache block).
  • the obtained BN1Y is merged into BN1 and placed on the bus 125.
  • the selector 140, 142 select the value of the read pointer 115 (ie, the branch point corresponding to the branch instruction itself) as the write address, and the selector 146 selects the BN1 on the bus 125.
  • the track point content output by the track table 110 contains BN1.
  • Subsequent operations with the above branch target are BN1
  • the situation in the direct branch instruction is the same and will not be described here.
  • the tracker pauses the update and waits for the processor core 102.
  • the branch address is generated when the branch instruction is executed, or is calculated by a dedicated module.
  • the controller sees a signal that the branch address has been generated (eg, processor core 102)
  • the block address in the branch target address is controlled to be sent to the active table 130 via the bus 155, the selector 148, and the bus 149. If in the active list 130 If the match is successful, the BN2X corresponding to the successful entry can be obtained, and the intra-block offset in the branch target address is taken as BN2Y.
  • the BN2X and BN2Y values are sent to the block address mapping module 134 match, if the hit gets the corresponding BN1 value, then the subsequent operation is the same as in the direct branch instruction of the above branch target BN1; if it does not, the subsequent operation and the above branch target are BN2
  • the situation in the direct branch instruction is the same and will not be described here. If the matching in the active table 130 is unsuccessful, indicating that the instruction corresponding to the target address of the branch has not been stored in the secondary cache 108, according to the replacement algorithm (such as The LRU algorithm) allocates a block number BN2X of the secondary storage block by the active table 130, and sends the branch target address to the lower level memory to retrieve the corresponding instruction block and store it to the secondary cache 108.
  • the replacement algorithm such as The LRU algorithm
  • BN2X points to the storage block.
  • the instruction block is then populated into the level 1 cache 104 and the corresponding track is built as described above, and the BN2 is converted to BN1.
  • Fill back the branch point (BN2 generated in this process will not be filled into the track table 110, but directly fill the corresponding BN1 into the track table 110), so that the track table 110
  • the output track point content contains BN1.
  • the subsequent operations are the same as those in the direct branch instruction in which the branch target is BN1, and will not be described here.
  • the controller If the next time the tracker re-reads the entry containing the indirect branch target, the instruction type of the entry is an indirect branch instruction, but the address type is BN1, the controller accordingly determines that the indirect branch instruction has been previously accessed, and may use the instruction type when the instruction type is an unconditional branch or a conditional branch and the branch decision 113 fed back by the processor core 102 is a branch.
  • the BN1 address guess is executed. But verify ( verify ) the guess address BN1.
  • the method may be to obtain the corresponding instruction address through the BN1 address (for example, through the BN1X).
  • the BN2X stored in the corresponding track addresses the active table 130 to read the instruction block address, and according to the location stored in the BN2X corresponding row in the BN2X in the block address mapping module 134 BN1Y is converted to BN2Y, and the instruction block address is spliced with BN2Y to get the complete instruction address), waiting for the processor core 102
  • the branch target address is compared with the inversely obtained instruction address. If they are the same, continue execution. If not the same, then clear the instruction after the branch point, without saving the result, from the processor core 102
  • the branch target address provided starts to be executed and the address is mapped to BN1 as in the previous example and then stored in the branch point.
  • the end track point is treated as an unconditional branch point, so when the tracker 120 The tracker 120 reads the pointer when the read pointer points to the track point before the end track point (ie, the last instruction in the instruction block), and the track point is not a branch point, or a branch point where branch branching does not occur. Continue to update, move to the end track point, and output BN1 to the Level 1 cache 104. Since the end track point does not correspond to the actual command, the tracker 120 reads the pointer 115. It is not updated to the first track point of the next track until the next clock cycle, so the first level cache 104 also needs to be directed to the processor core during this clock cycle.
  • An empty instruction ie, an instruction that does not change the internal state of the processor core, such as NOP
  • it can be sent to the primary cache 104.
  • the addressing address is judged. Once the addressing address is found to correspond to the ending track point, there is no need to access the L1 cache 104, and the null instruction is directly output for the processor core 102 to execute;
  • a memory cell is added to each row to store the dummy instruction, and the memory cell can be addressed by BN1 of the end track point itself output by the read pointer 115, thereby outputting the dummy instruction for the processor core 102. carried out.
  • this has the disadvantage of causing processor core 102 to spend an additional clock cycle per instruction block for executing useless null instructions.
  • Figure 1A can be modified so that the tracker 120 reads the pointer. 115 when pointing to the previous track point of the end track point, according to the instruction type of the track point and the processor core 102
  • the feedback of the instruction is executed to directly point to the branch target track point or the first track point of the next track in the next clock cycle.
  • FIG. 1B is another embodiment of the cache structure of the present invention.
  • Processor core 102 and level 1 cache in this embodiment 104, scanner 106, secondary cache 108, replacement module 124, active table 130, block address mapping module 134 and selectors 132, 140, 142, 146, 148, and 150 are the same as the embodiment of Fig. 1A.
  • the difference is that the track table 110 outputs the contents of two track points at a time (the tracker 120 reads the pointer 115) The pointed track point content 182 and a subsequent track point content 183), while the tracker 120 adds a type decoder 152, controller 154 and selector 116
  • processor core 102 additionally sends a signal 161 to controller 154 in tracker 120.
  • controller 154 performs Figure 1A Similar functions of controllers not shown, which are shown here to illustrate more complex functions and operations.
  • the read port of the track table 110 is read by the read pointer 115 of the tracker 120.
  • the controller 154 detects the type of the command on the bus 117
  • the type decoder 152 detects the bus.
  • two entries are read from the track table 110: the current entry 182 and its next (right) entry 183.
  • Current table entry 182 content via bus 117 Reads an input to the selector 118 and the controller 154, and when it is in the BN2 format, it is sent to the block address mapping module 134, etc., for mapping the BN2 in the content to BN1.
  • the next entry 183 is sent via bus 121 and sent to type decoder 152 for decoding, the result of which controls selector 116.
  • One input of selector 116 is derived from bus 121 The other input is derived from the BN1X in the read pointer 115 and the incremented BN1Y from the incrementer 114 (i.e., the BN1Y value in the read pointer 115 is incremented by one).
  • Type decoder 152 Only the unconditional branch instruction type is decoded. If the type on the bus 121 is an unconditional branch instruction type, the control selector 116 selects the output bus 121. The content above; if any other type, select BN1X from bus 115 and the added BN1Y from the increment 114 output.
  • Step signal 111 of 102 controls the input to be stored in register 112, causing the tracker to move to the right to the next address (ie, the order of the larger address BNX1 is unchanged, BNY1+ ' 1 ').
  • controller 154 controls selector 118 to select the bus.
  • the branch target address on 117 causes the read pointer 115 to jump to the track point location corresponding to the branch target address on bus 117.
  • controller 154 controls tracker 120.
  • the update is suspended and waits until the processor core 102 generates a TAKEN signal 113 that a branch transfer has occurred.
  • the register 112 is not only affected by the step signal 111 Control, also by the processor core, a signal indicating whether the Taken signal 113 is valid 161 control, requires the signal 161 to display the TAKEN signal 113 valid and step signal Register 112 is updated when 111 is also active.
  • selector 118 selects selector 116 The output is run as if a non-branch instruction was previously executed; if a branch transfer occurs (TAKEN signal 113 is '1'), selector 118 selects bus 117
  • the branch target address thereon is stored in the register 112, the pointer 115 points to the corresponding entry of the branch target in the track table, and the branch target instruction in the level 1 cache 104 is read out for the processor core 102. carried out.
  • controller 154 controls the registers in tracker 120. 112 pauses the update and waits, and the BN2 selects a row in the block address mapping module 134 via bus 117, selector 132, bus 133 to convert to obtain BN1. Address. And the BN1 is provided by the bus, and the original indirect branch entry in the track table is written by the bus 125, the selector 146, and the bus 147. This entry is via bus 117 Read, and the processing thereafter is the same as the previous example.
  • the tracker 120 follows the BN1 and executes the result according to the instruction fed back by the processor core 102 (e.g., the execution result of the branch instruction), and controls the level 1 cache 104 to the processor core. 102 Output instructions for execution.
  • branch transfer does not occur, it runs as before the non-branch instruction, and if the branch transfer occurs, it runs as before the unconditional branch instruction.
  • the controller 154 controls the register 112 in the tracker 120.
  • the update is suspended and waits for processor core 102 to send the branch target address via bus 155.
  • the address is sent to the active table 130 by the selector 148. If in the active table 130 If a match is obtained and a corresponding BN2 is generated, the BN2 is selected 132, and the bus 133 selects a row in the block address mapping module 134 to obtain BN1. Address, the operation is the same as the previous example. If there is no match in the active table 130, the branch target address is sent to the lower layer memory to obtain the corresponding instruction block to be filled into the secondary cache 108 And fills the required level one instruction block into the level one cache 104. The filled first level cache block number BN1 is filled in the block address mapping module 134, which is routed via the bus 125. Send out, the operation is the same as the previous example.
  • the branch type decoder 152 pairs the bus 121.
  • the above instruction type is decoded such that the selector 116 selects the branch target on the bus 121 without selecting BN1 provided by the incrementer 114 (the BN1 is BN1X, BN1Y+ ' 1 '), after the processor core 102 executes the corresponding instruction of the entry 182, the instruction corresponding to the entry 183 is not executed (because the entry corresponding to the entry 183 may be the end track point, in the level 1 cache) 104 does not have an instruction corresponding to it, but directly executes the corresponding instruction of the branch destination address contained in entry 183.
  • the entry 182 is a non-branch instruction
  • the next instruction executed after executing the instruction as described above is the entry 183.
  • the entry 182 is an unconditional branch instruction
  • the next instruction executed after the instruction is executed is the instruction pointed to by the branch target in the entry 182, and the entry 183 There is no impact on the process.
  • entry 182 is a conditional branch instruction
  • the next instruction executed after the instruction is executed depends on the TAKEN signal generated by processor core 102. .
  • selector 118 selects the branch target on bus 117 to indicate the TAKEN signal 113
  • the valid signal 161 controls the storage of the target into register 112, causing pointer 115 to point to the branch target, and the next executed instruction is entry 182.
  • the upper branch target indicating TAKEN signal 113 valid signal 161 and step signal 111 control will store the unconditional branch target from 183 into register 112 to make pointer 115 Pointing to the branch target, the next executed instruction is the instruction pointed to by the unconditional branch destination address in entry 183.
  • the unconditional branch destination in the end track point can also be the secondary cache address BN2.
  • Type decoder 152 on the decoding bus If the address type of the table item read on 121 is found to be in BN2 format, BN2 output from bus 121 can also be placed on bus 117. In the block address mapping module 134 The middle map is BN1 and the entry is stored back. For clarity and ease of explanation, this path is not shown in Figure 1B.
  • the type judgment of the conditional branch instruction can be in four ways.
  • the first way is that there is only one type of unconditional branch, that is, the unconditional branch instruction in the program, and the unconditional jump operation in the end track point added by the present invention to control the jump to the next track start entry is not added. distinguish.
  • This way the original conditional branch instruction in the program is skipped and not processed by the processor core.
  • 102 Execution, but the program flow is on the track table 110 and the tracker 120 Under the control of the branch instruction, the target instruction of the branch instruction and its subsequent instructions can be executed correctly. In this way, the clock cycle occupied by the original unconditional branch instruction is saved.
  • the cache system of the present invention does not require a PC to properly address the processor core 102. Provide the instructions it will execute for its uninterrupted execution. If you need to obtain the PC value at a certain time (such as debugging), the corresponding L2 cache block address of the first-level instruction block is recorded in each track table BN2X. And the level 2 cache sub-block address. Thus, BN2X can read the corresponding tag from the active table 130, and the secondary cache block address, the sub-block address, and the pointer BNY The value splicing is the PC value of the instruction being executed.
  • the second way is to have two types of unconditional branches. Among them, one is the end point unconditional branch type corresponding to the end point of each track in the track.
  • type decoder 152 It is considered that the end point does not correspond to an instruction in the program, whereby the control selector 106 selects the branch target on the bus 121, and jumps directly to the bus after executing the instruction on the bus 117. The branch destination address on the branch.
  • Another type of unconditional branch type in the corresponding program the type decoder 152 does not treat this type as a branch when it is decoded, and the control selector 116 selects the incrementer 114. Output.
  • the next executed instruction is the next instruction in the order, that is, the original unconditional branch instruction in the program. PC in the processor core in this way Then keep the correct value.
  • the third way is to improve the embodiment of Fig. 1B in the scanner 106.
  • the scanner will merge the end track point to the end in this case.
  • One instruction corresponds to the track point. That is, the instruction type of the last instruction is marked as an unconditional branch instruction, and the first instruction of the next instruction block is corresponding.
  • BN1 or BN2 (if BN2, the tracker will convert it to BN1 according to the previous example) as the track point content is stored in the track point corresponding to the last instruction.
  • the controller 154 will bus 117 in addition to reading the instruction from the level 1 cache 104 for the processor core 102 to execute normally.
  • the upper instruction type decoding finds an unconditional branch type, so the control selector 118 selects the bus 117 to update the read pointer 115 to the branch target of the unconditional branch BN1 in the next clock cycle. (ie BN1 corresponding to the first instruction of the next instruction block).
  • processor core 102 does not need to waste one clock cycle to execute a null instruction.
  • the scanner 106 During the process of reviewing the instruction block, if the last instruction of the first-level instruction block (corresponding to the last track point in one track) is found to be a branch instruction, the scanner does not merge the end track point into the corresponding instruction in this case. In the track point, the content of the end track point is placed at the track point after the track point corresponding to the last instruction of each track (right side).
  • the controller 154 Controlling the selector by unconditional branch type on bus 117 118 Selecting the branch target on bus 117 Put the pointer 115 , jump to the target, the end track point will not be executed.
  • the controller 154 controls the tracker 120 to pause according to the conditional branch type on the bus 117, waiting for the processor core. 102
  • the resulting branch decision signal 113 At this time, the type decoder 152 decodes the instruction type on the bus 121 as an unconditional branch, and the control selector 116 selects the bus 121. .
  • the controller 154 controls the selector 118 to select the conditional branch target on the bus 117 to place the pointer 115.
  • the controller 154 controls the selector 118 to select the output of the 116 selector, placing the unconditional branch target on the bus 121 on the pointer 115.
  • Level 1 cache 104 by pointer 115 The instruction is sent for execution by the processor core 102.
  • All of the above three methods are applicable to both fixed-length instructions and variable-length instructions. That is, it is not required to end the fixed position of the track point in the track. In addition, if the position of the end track point in the track is fixed, it can be based on the read pointer 115.
  • the value of BN1Y determines if the last instruction has been reached.
  • the fourth way is that there is only one unconditional branch type in the track table, but the tracker divides it into two types depending on where the type is in the track. In this way, the pointer BN1Y in 115 is sent to type decoder 152 and the type of instruction on bus 121 does not need to be decoded. Type decoder 152 when the BN1Y points to the last entry in a track.
  • Control selector 106 selects the branch target on bus 121 and jumps directly to the branch target address on bus 121 after executing the instruction on bus 117.
  • the type decoder 152 controls the selector 116 to select the output of the incrementer 114 when it points to an entry other than the last entry in a track.
  • the bus is executed 117 After the corresponding instruction of the contents of the above table item, the next executed instruction is the next instruction in the order. In this way, the PC in the processor core always maintains the correct value. This way adapts to fixed length instructions.
  • the present invention can control the processor core 102 to perform a speculation execution along the branch (speculate execution) ) to improve processor execution efficiency.
  • FIG. 1C which is an embodiment of the present invention supporting guess execution.
  • a selector 162 is added as compared with the tracker of Fig. 1B.
  • a register 164 for selecting and storing another temporary storage that is not selected by the branch guessing execution, in case of guessing an error.
  • Guess the execution direction can be from existing static predictions, or dynamic branch predictions (branch).
  • the prediction technique can also be determined by the branch prediction field stored in the entry of the corresponding branch instruction in the track table.
  • Input selector 118 is also replaced by three-input selector 218, register 164. The output is connected to the third input of selector 218.
  • the controller 154 is translating the bus 177.
  • the control selector 162 and the register 164 select the branch target address on the bus 117 to be stored in the register 164 when a conditional branch type is obtained and the predicted value is not branched.
  • Simultaneous controller 154 Control selector 118 selects the output of the 116 selector (which is the next instruction in the order of the branch instruction) for storage in register 112, causing pointer 115 to control the level one cache 104.
  • the next instruction in the order after the branch instruction is provided is executed by the processor core 102, and the instruction is flagged to the processor core for guess execution.
  • Pointer 115 also points to track table 110 After the branch instruction, the first entry is ordered so that it is placed on the bus 117. Then controller 154 presses bus 117 The type of instruction above determines the subsequent direction of the tracker and continues to provide instructions to the processor core. All of these instructions are marked as guess execution.
  • the controller 154 Compare the predicted branch direction to the branch direction on 113. If the comparison result is the same, continue execution in the original guess direction. If the comparison result is different, then controller 154 is directed to processor core 102. A 'guess error' signal is sent, causing the processor core to clear all instructions with guess execution flags and their intermediate execution results.
  • the controller 154 is translating the bus 177.
  • the control selector 162 and the register 164 select the output of the 116 selector (which is the next instruction of the branch instruction sequence) to be stored in the register. 164.
  • the controller 154 controls the selector 118 to select the branch target address on the bus 117 for storage in the register 112, so that the pointer 115 controls the level 1 cache 104.
  • the branch target instruction that provides the branch instruction is executed by the processor core 102 and marks the instruction core to the processor core for guess execution. Pointer 115 also points to the track table pointed to by the branch destination address on bus 117. The middle entry is placed on the bus 117.
  • controller 154 presses bus 117
  • the type of instruction above determines the subsequent direction of the tracker and continues to provide instructions to the processor core. All of these instructions are marked as guess execution.
  • the controller 154 Compare the predicted branch direction to the branch direction on 113. If the comparison result is the same, continue execution in the original guess direction. If the comparison result is different, then controller 154 is directed to processor core 102. A 'guess error' signal is sent, causing the processor core to clear all instructions with guess execution flags and their intermediate execution results.
  • controller 154 controls selector 218 to select register 164
  • the output of the branch that causes the branch to be unconfirmed is used to control the level 1 buffer 104 to provide instructions to the processor core 102 and continue execution there.
  • the level 1 cache 104 The instruction blocks in which all direct branch instructions and branch target instructions of most branch instructions are prefetched into the secondary cache 108 Medium, so there is no degradation in processor system performance due to missing L2 cache.
  • the branch point corresponding to the branch instruction in the level 1 cache 104 contains BN1
  • the instruction block in which the branch target instruction is located has been stored in the level 1 cache 104, and the performance of the processor system is not degraded due to the absence of the level 1 cache; however, if the branch point content contains BN2 , then a level 1 cache miss will still occur.
  • Figure 1A can be modified such that the tracker 120 read pointer 115 can point to the branch point earlier and advance the instruction block from the secondary cache 108. Populate to level 1 cache 104 and convert BN2 to BN1.
  • FIG. 1B and FIG. 1C Similar improvements can be made to FIG. 1B and FIG. 1C, and will not be described again in this specification.
  • FIG. 2 is another embodiment of the cache system of the present invention.
  • the processor core 102 and the level 1 cache 104, scanner 106, secondary cache 108, track table 110, replacement module 124, active table 130, block address mapping module 134 and selector 132, 140 , 142 , 146 , 148 , 150 and the corresponding controllers are the same as in the embodiment of Fig. 1A.
  • the difference is that the register 112 is no longer controlled by the step signal 111 but by the branch signal 161 and the controller jointly control the type information on the bus 117.
  • a first in first out buffer (FIFO) 226 for storing the level 1 cache addressing address of the tracker output. BN1.
  • FIFO first in first out buffer
  • the write to buffer 226 is controlled by the control signal of register 112, that is, each time register 112 is stored with a new value, which is then written to 226.
  • Buffering 226 The readout is controlled by the step signal 111 to address the BN1 stored in the first buffer output in the first-in first-out order to obtain the corresponding instruction for execution by the processor core 102.
  • the controller When the controller translates the type information on the bus 117 to a non-branch type, it causes the selector 118 to select the incrementer 114. The output of the register is stored in the incrementer's output, and the next entry in the same track row in the track table 110 is pointed and read by the pointer 115.
  • the controller When the controller translates the bus 117
  • the type information on is unconditional branch type, it causes selector 118 to select bus 117 and store register 112 in the address on bus 117 via pointer 115. Execute and read the branch target entry. If the controller translates the type information on bus 117 to a conditional branch type, it causes both selector 118 and register 112 to be controlled by processor core 102.
  • the selector 118 selects the output of the incrementer 114; when the branch determination signal 113 is '1', the selector 118 Select bus 117.
  • the register 112 is stored in the output of the selector 118, causing the pointer 115 to be on the processor core 102.
  • the judgment result in the pointer points to and reads out the corresponding instruction of the next instruction in the order of the branch instruction in the track table 110, or the corresponding entry of the branch target.
  • Step signal 111 Pointing to the output of select incrementer 114 and making register 112
  • the output of the incrementer is stored, and the next entry in the same track row is pointed and read by the pointer 115. It is also possible to incorporate its function into the step signal 111 without using the branch valid signal 161, ie when the processor core 102 A branch instruction has been executed but no branch decision has been made to cause step signal 111 to be '0', causing register 112 to pause. When the branch judges that the branch judgment signal 113 is valid, the step signal is made 111 is '1', and the register 112 is restored to the same effect as the aforementioned use of the branch valid signal 161.
  • the register 112 When the controller translates the type information on the bus 117 to the BN2 type, the register 112 is not updated, so that the tracker pointer 115 keep pointing to the entry, according to the previous example, send BN2 from bus 117, selector 132 and bus 133 to block address mapping module 134 to map to obtain the corresponding BN1 address, via bus 123, selector 146 and bus 147 write back to the same entry pointed to by pointer 115 in track table 110. Thereafter, the BN1 is read out via the bus 117 by the controller according to the BN1.
  • the instruction type control selector 118 and register 112 operate as in the previous example.
  • register 112 When the controller translates the type information on bus 117 to the indirect instruction type, then register 112 is not updated, causing the tracker pointer 115 Keep pointing to the entry, and the indirect branch target instruction address 155 generated by the processor core 102 is sent to the active table 130 by the selector 148 according to the example in FIG. 1A. Match the resulting BN2 Via the selector 132, the bus 133 is sent to the block address mapping module 134 and mapped to BN1. The bus pointer 123 is written back to the track table 110 via the bus 123, the selector 146 and the bus 147. 115 The same entry pointed to. Thereafter, the BN1 is read out via the bus 117, and the controller controls the selector 118 and the register 112 to operate according to the previous example according to the instruction type of the BN1.
  • tracker 120 read pointer 115 can be directed to processor core 102 in advance.
  • the bus 117 can be passed through the bus when the read pointer 115 passes the branch point corresponding to the branch instruction. Reading the BN2 and using the tracker 120 read pointer 115 to point to the branch point and buffer 226 to output BN1 of the branch point to the level 1 cache 104 to obtain instructions for the processor core 102.
  • the time difference between executions, the instruction block is padded from the L2 cache 108 to the L1 cache 104 as described in the previous embodiment, and BN2 is converted to BN1 and written back to the branch point. In this way, when the processor core When the branch instruction is executed, the instruction block in which the branch target instruction is located is already stored in the level 1 cache 104, and no cache miss occurs.
  • a sub tracker (slave tracker) can be added to the embodiment of Fig. 1A.
  • the tracker is provided with an addressing address to the level one cache 104 as in the tracker of the embodiment of FIG. 1A to obtain instructions for execution by the processor core 102, and the secondary tracker is as shown in FIG.
  • the tracker in the embodiment advances to several track points after the instruction being executed by the processor core 102, and fills the instruction block of the branch target instruction in the second level buffer 108 of these instructions into the level 1 cache. 104 in.
  • FIG. 1B is another embodiment of the cache system of the present invention.
  • the processor core 102, the first level cache 104, the scanner 106, and the second level cache 108, track table 110, replacement module 124, active table 130, block address mapping module 134 and selectors 132, 140, 142, 146, 148, 150 and the corresponding controller are the same as in the embodiment of Fig. 1A.
  • the difference is that the track table 110 in this embodiment There are two sets of read ports, which can output the corresponding track point contents according to the two read pointers at the same time. Further, a sub-tracker 320 is added to this embodiment.
  • the structure of the secondary tracker 320 and the tracker 120 Similarly, consisting of register 312, incrementer 314, and selector 318, whose output read pointer 315 can independently address track table 110.
  • the track table 110 is based on the tracker
  • the address of the read pointer 115 is read, the corresponding track point content is output from the bus 117, and the address of the read pointer 315 is read according to the side tracker 320, from the bus 317. Output the corresponding track point content.
  • the read pointer 115 of the tracker 120 includes BN1X and BN1Y, and its operation process and diagram The tracker in the 1A embodiment is the same and will not be described again here.
  • the read pointer of the secondary tracker 320 includes only BN1Y, and its operation is similar to that of the tracker in the embodiment of Fig. 2. Specifically, the secondary tracker Register 312 in 320 is updated every clock cycle. When the value of the TAKEN signal 113 sent by the processor core 102 is '0', it indicates the current processor core 102.
  • the selector 318 selects the BN1X derived from the register 312 and the added BN1Y derived from the incrementer 314.
  • the read pointer 315 is directed to the track table 110. The next track point of the current track in this way, repeats until it points to the last track point of the track. During this process, the content of the track point that the read pointer 315 passes is read out, and if the track point is found to contain BN2
  • the branch point is sent to the block address mapping module 134 and the second level buffer 108 via the bus 317, and the BN2 is in the presence of the valid BN1X as described in the previous embodiment.
  • the tracker 120 read pointer 115 is controlled by the step signal 111 to the level 1 cache 104.
  • An addressing address is provided to fetch instructions for execution by processor core 102.
  • tracker 120 reads pointer 115.
  • the value of the branch instruction is the addressing address BN1 of the branch point corresponding to the branch instruction, and the track table 110 is addressed to read the branch point content and output to the selector 118 in the tracker 120 via the bus 117 output.
  • a selector 318 in the secondary tracker 320 This is operated under the control of step signal 111 and TAKEN signal 113.
  • selector 118 selects BN1 on bus 117.
  • the register 112 is updated such that the read pointer 115 points to the branch target track point and the first stage cache point itself is supplied from the branch track point BN1 to obtain the branch target instruction for the processor core.
  • the selector 318 also selects the BN1 update register 312 on the bus 117 so that the read pointer 315 also points to the branch target track point.
  • Tracker 120 Continued control of the L1 cache 104 as described above provides subsequent instructions to the processor core 102 for the branch target instruction, and the secondary tracker 320 The read pointer 315 is moved over the track (i.e., the track on which the branch target track point is located) in the foregoing manner, ensuring that the instruction block in which the branch target instruction of the branch point through which it passes passes is filled into the level one cache 104.
  • the selector 118 selects the source register 112. BN1X in the sum and BN1Y from the incrementer 114 as the new BN1 update register 112, making the read pointer 115 The next track point pointing to the branch point is provided to the first level cache 104 from the next track point itself BN1 to obtain the corresponding instruction for execution by the processor core 102. And selector 318 Also selected is BN1X from register 312 and BN1Y from increment 314 as new BN1 update register 312, which continues to move the read pointer on the current track. 315. Ensure that the instruction block in which the branch target instruction of the branch point through which the branch point passes is filled into the level one cache 104.
  • the present embodiment implements the same functions as the embodiment of FIG. 2, such that the processor core 102 When a branch instruction is executed, the instruction block in which the branch target instruction is located is already stored in the level 1 cache 104, and no cache miss occurs.
  • the number of entries per row in the track table is equal to the number of instructions in the corresponding level one instruction block. Since only the branch instruction needs to store the branch target addressing address (BN1 or BN2) Only the instruction type is necessary in the content of the track point corresponding to the branch instruction, so that a large amount of useless data is stored in the track table. Therefore, the track table 110 can be compressed to save storage space.
  • Figure 4 It is another embodiment of the cache system of the present invention. This embodiment is based on the embodiment of Fig. 1A, and the internal structure of the track table 110 is described in detail. For ease of description, in Figure 4 Only some of the modules are shown.
  • the L1 cache 104, the processor core 102, and the tracker 120 are the same as the corresponding components in the embodiment of Fig. 1A.
  • the structure of the secondary tracker 420 and Figure 1A The tracker 120 in the embodiment is similar, but its output read pointer is column pointer 425, contains only the column address or column number (MBNY), and the selector 418 and registers in the secondary tracker 420 432 Accepts the difference between the control signal and the tracker 120.
  • the track table 110 is composed of an instruction type table 410 and a target address table 412.
  • the one-to-one correspondence between the two and the row of the first-level cache 104 is directed by the same BN1X sent from the bus 411.
  • Instruction Type Table 410 The last column is the end entry, and the number of columns of the remaining entries is equal to the number of instructions in the first-level instruction block, and one-to-one correspondence.
  • the level 1 cache is stored in each entry except the end entry.
  • the basic type of the corresponding instruction, and a branch instruction type is stored in the end entry.
  • the instruction type table 410 contains basic type information for each instruction in the corresponding memory block in the track table 110.
  • the number of columns is greater than or equal to the maximum number of branch instructions (including unconditional branches of the end track point) that may exist in a level one instruction block, in each row in the order in which the branch instructions in the corresponding level one instruction block appear in order from the left
  • Each branch item to the right stores a branch target instruction corresponding to the corresponding branch instruction BN (can be BN1 or BN2).
  • the correct target instruction BN has been stored in the target address table 412.
  • the entry is called a valid entry, and the remaining entries are invalid entries.
  • the BN of the target track point corresponding to the end table of the row in the corresponding instruction type table 410 is stored in the last valid entry in each row.
  • the target address table 412 contains the addressing addresses of all the branch target track points in the track table 110. And the target address table 412 corresponding to any one track contains at least one valid entry (that is, at least the target instruction BN corresponding to the end track point is stored).
  • FIG. 5 is an embodiment of the instruction type table and the target address table according to the present invention.
  • the level 1 cache 104 There are 4 instruction blocks (instruction block 0 ⁇ instruction block 3), each instruction block contains 8 instructions (instruction 0 ⁇ instruction 7) as an example.
  • the track table 110 also contains 4 lines (4). Track). Therefore, the instruction type table 410 in the track table 110 contains 4 lines (0 rows ⁇ 3 rows), one row for each instruction block, and 9 entries, of which the first 8 entries (the first) 0 entry ⁇ 7th entry) corresponds to 8 in the instruction block, and stores the instruction type of the corresponding instruction.
  • '1' is used to indicate the branch instruction type, using '0' 'Represents a non-branch instruction type.
  • the last entry (column 8) is the end entry and must be a branch instruction type.
  • the destination address table 412 in the track table 110 also contains 4 rows (0 rows ⁇ 3 rows), each row containing 3 Table entry, which is used to store the BN of the branch target instruction, and more detailed type information (such as direct branch, indirect branch, conditional branch, unconditional branch, branch target is BN1 or BN2) Wait).
  • more detailed type information such as direct branch, indirect branch, conditional branch, unconditional branch, branch target is BN1 or BN2
  • Wait assuming that there are at most 2 branch instructions in each instruction block, plus the unconditional branch transfer corresponding to the end entry, each row requires a maximum of 3 entries for storing the BN of the branch target instruction.
  • the number of columns of the target address table 412 can be increased accordingly, and details are not described herein again.
  • the more detailed type information is stored in the instruction type table 410. It is also possible, but it increases the capacity of the instruction type table 410 with more entries, so it is stored in the destination address table 412 instead of the instruction type table 410. The purpose is to further save storage space.
  • the extracted instruction type can be stored in the instruction type table.
  • the branch target instruction BN and the branch type information are also stored in the target address table.
  • the BN1X in 412 The first invalid entry in the line pointed to, making it a valid entry.
  • the function can be implemented by a counter, and when a new track is established, the counter is cleared, and whenever a branch instruction is reviewed, a branch target instruction is matched or assigned.
  • the instruction type table 410 stores the basic types of all track points on the track, and the target address table.
  • the branch target instruction and branch type information of all branch points and end track points on the track are stored in 412.
  • the contents of the entry in the target address table 412 are composed of four parts, and the first part indicates that the target instruction addressed address is BN1 or BN2 ('1' means BN1, '2' means BN2), and the second part indicates the type of the branch instruction ('C' indicates direct conditional branch, 'U 'Expresses a direct unconditional branch. ' I ' indicates an indirect conditional branch), and the third and fourth parts represent BNX and BNY in the target instruction BN, respectively.
  • the target address table 412 is 0.
  • the 0th entry of the row, '2C83' indicates that the corresponding branch instruction is a conditional branch instruction, and the target instruction addressing address is BN2, where BN2X is '8' and BN2Y is '3'. '.
  • the tracker 120 Waiting for the processor core 102 to send the branch target address to the active table 130 match via the bus 155, as in the previous example, the BN1 address is obtained and the BN1 address is pressed from the primary cache 104. The read instruction is for execution by the processor core 102. If the addressing type of an instruction is BN1 and the instruction type is 'I', the indirect conditional branch instruction, at this time, the tracker 120 can use the BN1.
  • the address pointer 115 points to the level one instruction cache 104, from which the corresponding instruction is read, and its subsequent instructions are for the processor core 102 to guess execution, but the processor core 102 should be routed via the bus 155.
  • the indirect destination address sent is matched with the BN1 obtained by the active table 130 and the block address mapping module 134 and the BN1 executed by the guess. Compared. If the comparison result is the same, the processor core continues to execute. If the comparison result is not the same, the processor core 102 will clear the instruction executed on the pipeline and its intermediate result, and the controller control will match the obtained BN1.
  • the BN1 that was originally guessed is replaced in the entry written to the track table.
  • the tracker reads the BN1 and executes it according to the previous example.
  • Track 0 the eighth entry of the three tracks in the instruction type table 410 is used as the end entry, and the value is '1'.
  • Track 0, Item 3 Track No. 2
  • the value of the 6 entry and the track 4th entry of the 3rd track is '1', indicating that the track point corresponding to these entries is a branch point, and the BN of the branch target instruction is correspondingly stored in the target address table 412.
  • the 0th entry of track 0 is the first branch point in the track, and the 0th entry of the 0th line of the target address table 412 corresponds to it.
  • the next branch point of track 0 is the end track point, the target address table 412
  • the first entry of row 0 corresponds to it. Since the end track point is regarded as an unconditional branch in this embodiment, the corresponding part in the content of the entry is 'U'.
  • the read pointer 115 of the tracker 120 is only as long as the instruction type table 410 in the track table 110.
  • the row moves, and the basic types corresponding to all the track points in one track are sequentially read, and the BN1 in the read pointer 115 is sent to the first level cache 104.
  • the instruction is obtained via the bus 103 for the processor core.
  • 102 Execution According to the technical solution of the present invention, before the processor core 102 executes the branch instruction, the track point content of the branch instruction is read out in advance, and if the BN2 is included, the corresponding instruction block is buffered from the second level. 108 is populated into the level 1 cache via bus 105.
  • the read pointer 115 points to a branch point, or a continuous non-branch point before the branch point, that is, through the bus 423
  • the branch target BN corresponding to the branch point is read from the target address table 412, and it is judged whether it is BN1 or BN2 for subsequent operations (the function of the bus 423 is equivalent to the bus of FIG. 1A). ).
  • the value of the register 422 is '0', that is, the selector 418 Select the output from the Incrementer 414 (the target address table 412 in Register 432 is incremented by one). And sent to the read pointer 115 of the memory 426 via the bus 413.
  • the basic type information that BN1Y reads each time is stored in register 424.
  • the register is in the next clock cycle.
  • the value of 424 is '1', and control register 432 writes the output of selector 418, causing column pointer 425 to be updated to the original column number by one, pointing to destination address table 412. The next entry in .
  • TAKEN signal 113 value is '1' '
  • the underlying type information value '1'
  • the read pointer 115 Pointing to the branch target track point
  • the BN1Y map of the track point is converted by the offset address mapping module 416 to the column number of the entry corresponding to the first branch point after the track point in the target address table 412. And put on the bus 419.
  • the value of the register 422 is '1'
  • the control selector 418 selects the column number on the bus 419.
  • register 424 The value of register 424 is '1', the control register 432 writes the output of selector 418, causing column pointer 425 to be updated to the column number on bus 419, pointing to destination address table 412 The entry corresponding to the first branch point after the track point. Since the rows of the instruction type table 410 and the target address table 412 correspond one-to-one, it is only necessary to map the columns. Figure 4 The mapping operation is implemented by the offset address mapping module 416 in an embodiment.
  • the selector array 601 The number of columns in the middle selector is equal to the number of instructions in the level one instruction block, that is, 8 columns; the number of rows in the selector is equal to the maximum number of entries in the destination address table 412.
  • Figure 4 shows 4 Rows, 4 columns, are the first 4 rows from bottom to top and the first 4 columns from left to right.
  • the 0th line of the next line of behavior the line numbers of the above lines are incremented.
  • the leftmost column is the 0th Columns, the column numbers of the columns to the right are incremented, and each column corresponds to a column in the instruction type table.
  • input A is '1' and input B is '0'.
  • inputs A and B are both '0'.
  • Input 0 of all selectors on line 0 is '0'.
  • Inputs for other column selectors A From the output value of the same row selector in the previous column, input B is derived from the output value of the selector in the next row of the previous column.
  • the decoder 605 reads the BN1Y in the pointer 115 from the tracker 120 sent from the bus 413. The decoding is performed, and the obtained mask value is sent to the masker 607.
  • the width of the mask value is also 8 bits, wherein the value of the mask bit before the mask bit corresponding to the BN1Y is '1', the BN1Y The corresponding mask bit and the value of the subsequent mask bit are both '0'. Thereafter, the mask value is sent to the BN1X from the read pointer 115 sent from the instruction type table 410.
  • the bitwise AND operation is performed for the 8-bit instruction type other than the instruction type corresponding to the end track point in the line content read out by the address, thereby retaining the BN1Y in the line
  • the corresponding mask bit precedes the value of the instruction type and clears the remaining values, resulting in an 8-bit control word being sent to the selector array 601.
  • Each bit of the control word controls a column of selectors in selector array 601.
  • the bit is '0 'When the selector of the corresponding column selects input A; when the bit is '1', the selector of the corresponding column selects input B. That is, for the selector array 601 In each column selector, if the corresponding control bit is '1', the output value from the next row of the previous column is selected as the input, so that the output value of the previous column is shifted up by one row, and the last row is filled with '0'.
  • the output is encoded, and the resulting column address MBNY of the destination address table 412 is sent via bus 419, thereby completing the conversion of the column address between the instruction type table 410 and the destination address table 412 (ie, Conversion between BNY and MBNY).
  • the current read pointer 115 has a BN1X of '1' and a BN1Y of '4' '
  • the first branch point after the track point (BN1X is '1', BN1Y is '4') should be found in the destination address table 412 (BN1X is '1', BN1Y The corresponding entry for '6') (ie, row 1 and list 1).
  • the mask value output by the masker 607 is '11110000', and the first one sent from the instruction type table 410
  • the row value ' 00100010 ' is bitwise and manipulated to get ' 00100000 ', ie there is 1 ' 1 ' in the control word.
  • '1' in the input of selector array 601 is shifted up 1 line, that is, the output of the selector array 601 is '01000000' from bottom to top, and is encoded by the encoder 603 to obtain '1' output via the bus 419, so that the instruction type table 410
  • the column address (BN1Y ) ' 4 ' of row 1 is converted to the column address ( MBNY ) ' 1 ' of row 1 of the destination address table 412.
  • the current read pointer 115 has a BN1X of '0' and a BN1Y of '3' ', corresponding to a branch point, you should find the corresponding entry in the target address table 412 (BN1X is '0', BN1Y is '3') (ie 0th row 0) List item).
  • the mask value output by the masker 607 is '11100000', and is obtained by bitwise AND operation with the 0th row value '00010000' sent from the instruction type table 410. 00000000 ', that is, there are 0 '1' in the control word.
  • '1' in the input of the selector array 601 is not moved up, that is, the output of the selector array 601 is from bottom to top. 10000000 ', encoded by encoder 603 to get ' 0 ' output via bus 419, so that the column address of row 0 of instruction type table 410 ( BN1Y )' 3 'Convert to the target address table 412 Column 0 of row 0 (MBNY ) ' 0 '.
  • a row of memory 426 is also added for storing the tracker in the instruction type table 410.
  • 120 Read Pointer 115 The current line pointed to.
  • the memory 426 is not necessary, but when the memory 426 is present, once the tracker 120 reads the pointer 115 Pointing to a new track, the BN1X value of the read pointer 115 is changed, and the contents of the line pointed to by the new BN1X in the instruction type table 410 are read and stored in the memory 426. Medium. Thereafter, the read pointer 115 only needs to access the memory 426 to read the corresponding instruction type, reducing the number of accesses to the instruction type table 410 to reduce power consumption.
  • the register 112 in the tracker 120 is controlled by the step signal 111 as described in the previous embodiment.
  • the BN1X in the read pointer 115 is also sent to the destination address table 412 via the bus 411, and the BN1Y is also sent to the offset address mapping module 416 via the bus 413.
  • memory 426 In the offset address mapping module 416, the BN1Y is mapped to the column number of the destination address table 412 and sent to the selector in the secondary tracker 420 via the bus 419. 418 as an input.
  • the current processor core 102 is stored in register 432.
  • the first branch point after the executed instruction is the column number in the target address table 412, and the column number is sent to the destination address table 412 via the secondary tracker 420 column pointer 425. It can be on the bus 411.
  • the corresponding entry is found in the row pointed to by BN1X, and the contents are read out and sent to selector 118 via bus 423.
  • Incrementer 414 in secondary tracker 420 then to register 432
  • the column number in the increment is incremented by the column number of the next branch point when executed sequentially, and sent to selector 418 as another input.
  • Register 422 accepts the TAKEN signal from processor core 102.
  • the selector 418 selects the bus in the next clock cycle.
  • the converted column number output on 419 is sent to register 432, otherwise the incremented column number output from increment 414 is selected and sent to register 432.
  • the BN1Y bus 413 on the read pointer 115 reads the corresponding instruction type from the memory 426 via the bus.
  • the 421 is sent to the register 424 in the secondary tracker 420 for a one-clock cycle as a write enable signal for the register 432. If the write enable signal has a value of '0 ', indicating that the instruction being executed by the current processor core is not a branch instruction, the value of register 432 (ie, the column number) remains unchanged; if the value of the write enable signal is '1' ', indicating that the instruction being executed by the current processor core is a branch instruction, and the value of register 432 is updated to the output of selector 418.
  • register 432 When the processor core executes the branch instruction, register 432 The value of the branch will be updated accordingly to the column number corresponding to the branch target instruction BN1Y or the next column number on the current track.
  • the selector 418 selects the incrementer 414. The output is stored in register 432, with the result that column pointer 425 points to the entry to the right of the original entry in branch target table 412.
  • the selector 418 selects the bus.
  • the contents of 419 are stored in register 432, with the result that column pointer 425 points to the first branch instruction after the branch target mapped by offset address mapping module 416 is in track table 412. The entry in the table.
  • Both tracker 120 and secondary tracker 420 operate in accordance with the basic types of instructions in memory 426.
  • the tracker 120 updates the read pointer with the output of the incrementer 114. , causing it to move to the next track point in sequence, and controlling the level 1 cache 104 to output the next instruction in the corresponding order for execution by the processor core 102; the register in the secondary tracker 420 is not updated, the column pointer 425 Do not move.
  • the tracker 120 When the output bus 421 of the memory 426 is '1' (i.e., is a branch instruction type), the tracker 120 The read pointer 115 is updated according to the branch type and branch judgment. If the branch transfer does not occur, the read pointer 115 moves to the next track point and controls the level 1 cache 104. The next instruction in the corresponding sequence is output for execution by the processor core 102; if a branch transfer occurs, the read pointer 115 jumps to the output bus 423 from the branch target table 412. The branch target and control level 1 cache 104 outputs branch target instructions for execution by processor core 102.
  • the registers in the secondary tracker 420 are also updated accordingly, and the column pointer 425 is moved. If the branch transfer does not occur, the column pointer 425 Moving to the next column of the branch target table 412, the entry in the column pointed to by the BN1X address 411 in the read pointer 115 is read and sent to the selector 120 in the tracker 120 via the bus 423. Alternate. If a branch transfer occurs, the column pointer 425 jumps to the read pointer after the jump 115. The resulting branch target table is mapped by the offset address mapping module 416. In one of the columns, the entry pointed to by the BN1X address 411 in the read pointer 115 having the jump number in the column is sent to the selector 118 in the tracker 120 via the bus 423 for use.
  • the destination address format stored in the entry of the branch target table 412 is the track table addresses BN1X and BN1Y. .
  • the track table address is selected to be placed on the read pointer 115, wherein the row address BN1X bus 411 points to an instruction block of the level 1 buffer 104 and the instruction type table 410.
  • Offset Address Mapping Module 416 Map BN1Y on bus 413 to column address MBNY to sub-tracker 420 and column pointer 425. Column pointer 425 and read pointer 115 in the BN1X bus 411 cooperates with an entry, and the contents of the entry (including the branch target addresses BN1X and BN1Y) are sent to the tracker 120 via the bus 423 for use.
  • the TAKEN signal 113 controls the selector 118 to select from the bus 423.
  • the branch target is for writing to register 112.
  • the tracker 120 read pointer 115 is updated to BN1 of the branch target instruction, where BN1X is via bus 411.
  • the row in which the branch target is located in the read instruction type table 410 is sent to the offset address mapping module 416 and the memory 426.
  • Read BN1Y on pointer 115 via bus 413 The data sent to the offset address mapping module 416 is converted into the column number of the branch target table 412 according to the data of the row of the branch target, and the values of the registers 422 and 424 are both '1.
  • the column number is sent to the selector 418 via the bus 419, and is selected and written to the register 432, so that the column pointer 425 is updated to the column number corresponding to the branch target instruction BN1Y.
  • the column number on BN1X and column pointer 425 in 115 can be in the destination address table 412
  • the middle point points to the entry corresponding to the first branch point starting from the branch target instruction.
  • the contents of the entry are read and the branch target is sent to the selector via bus 423.
  • the branch type is sent to the controller for backup.
  • the read pointer 115 When the read pointer 115 points to the next '1' in the memory 426 (corresponding to the branch target that has been sent to the selector 118), the read pointer 115 controls the level 1 cache.
  • the corresponding instruction is output via the bus 103 for the processor core 102 to execute and the branch transition determination is made, the tracker 120 is read as described in the previous embodiment. The specific process will not be described here.
  • the value of register 422 is '0' and the value of register 424 is '1'. ', in the next clock cycle, the BN1X in the tracker 120 read pointer 115 remains unchanged, and the selector 418 selects the output of the incrementer 414 to write to the register 432, causing the column pointer 425 update, the original column number increased by one. At this time, according to the BN1X in the read pointer 115 and the row number on the column pointer 425, it can be in the target address table 412. Find the entry corresponding to the first branch instruction after the branch instruction (the entry to the right of the original entry).
  • the contents of the entry are read and sent to the selector 118 via the bus 423, and then the tracker 120.
  • the read pointer 115 points to the branch point and controls the level 1 cache 104.
  • the branch transfer result is generated, and the tracker is updated as described in the previous embodiment. 120 Read pointer 115, the specific process will not be described here.
  • the BN1 goes through the bus 423 It is sent to the selector 118 in the tracker 120.
  • the memory 426 addresses the output instruction type ('0) according to the BN1Y value ('0') of the read pointer 115 on the bus 413. ') is sent to register 424 via bus 421 for temporary use for the next cycle.
  • the controller controls the selector based on the instruction type ('0') on bus 421 as a non-branch. 118 Select Incrementer 114 The output is sent to register 112. Since no branch transfer occurs, the TAKEN signal (' 0 ') is also sent to register 422 for temporary use for the next cycle.
  • the tracker 120 reads the pointer 115 and increments by '11', and via the buses 411 and 413.
  • the level 1 cache 104 is addressed to read the corresponding instructions for execution by the processor core 102 via the bus 103. Since the value of register 422 in this clock cycle is '0', offset address mapping module 416 Does not work; register 424 has a value of '0', so register 432 in sub-tracker 420 is not updated, ie column pointer 425 remains unchanged.
  • the memory 426 is based on the bus 413.
  • the BN1Y value (' 1 ') addressing output instruction type (' 0 ') of the up-read pointer 115 is sent via bus 421 to register 424 for temporary use for the next cycle. Controller based on bus 421 The upper instruction type ('0') is the branch control selector 118. The output of the incrementer 114 is selected and sent to register 112. And because no branch transfer occurs, the TAKEN signal (' 0 ') is also sent to register 422 for temporary use for the next cycle.
  • the tracker 120 reads the pointer 115 and increments by '12', and via the buses 411 and 413.
  • the level 1 cache 104 is addressed to read the corresponding instructions for execution by the processor core 102 via the bus 103, i.e., the processor core 102 executes the corresponding branch instructions.
  • Controller based on bus 421 The upper instruction type is a branch instruction, the branch type on the bus 423 is a conditional branch and the address format is BN1, and the control tracker pauses to wait for the branch judgment of the processor core 102.
  • memory 426 is based on the bus 413 Up-read pointer 115 BN1Y value (' 2 ') addressing output instruction type (' 1 ') is sent to register 424 via bus 421 Temporary storage for the next cycle. Assume that the branch transfer does not occur, that is, the TAKEN signal 113 is '0'. The TAKEN signal ('0') controls the selector of tracker 120 118 The increased BN1Y of the output of the incrementer 114 is selected such that BN1 on the read pointer 115 is updated to '13' in the next clock cycle.
  • register 422 in this clock cycle is ' 0 ', so offset address mapping module 416 does not work; register 424 has a value of '0', so register 432 in secondary tracker 420 is not updated, ie column pointer 425 constant. Since the branch transfer does not occur, the TAKEN signal ('0') is sent to register 422 for temporary use for the next cycle.
  • the tracker 120 reads the pointer 115 and increments by one to get '13', and via the buses 411 and 413.
  • the level 1 cache 104 is addressed to read the corresponding instructions for execution by the processor core 102 via the bus 103.
  • the controller is a non-branch control selector based on the instruction type ('0') on bus 421.
  • the output of the select incrementer 114 is sent to register 112.
  • the BN1 is sent to the tracker via bus 423
  • the selector 118 in 120 is reserved.
  • the memory 426 addresses the output instruction type ('0') via the bus according to the BN1Y value ('3') of the read pointer 115 on the bus 413.
  • 421 is sent to register 424 for temporary use for the next cycle. Since no branch transfer occurs, the TAKEN signal ('0') is sent to register 422 for temporary use for the next cycle.
  • the tracker 120 reads the pointer 115 by one each clock cycle, in turn ' 14 ', ' 15 ', and sequentially addressing the level 1 cache 104 via buses 411 and 413 to read the corresponding instructions via the bus 103 for the processor core 102 carried out.
  • the offset address mapping module 416 since the value of the register 422 is '0', the offset address mapping module 416 does not work; the value of the register 424 is '0', so the sub-tracker 420
  • the middle register 432 is not updated, ie the column pointer 425 remains unchanged.
  • the memory 426 reads the BN1Y value of the pointer 115 on the bus 413 (in order of '4', '5
  • the ') addressing output instruction type ('0') is sent via bus 421 to register 424 for temporary use for the next cycle.
  • the TAKEN signal (' 0 ') is sent to the register. 422 temporary storage for the next cycle.
  • the tracker 120 reads the pointer 115 and increments by one to get '16', and via the buses 411 and 413.
  • the level 1 cache 104 is addressed to read the corresponding instructions for execution by the processor core 102 via the bus 103, i.e., the processor core 102 executes the corresponding branch instructions.
  • the memory 426 is based on the bus 413.
  • the BN1Y value (' 6 ') of the up-read pointer 115 addresses the output instruction type (' 1 ') to the register 424 via bus 421 for temporary use for the next cycle.
  • the upper instruction type ('1') is a branch instruction
  • the branch type on bus 423 is a conditional branch and the branch target is BN1
  • the control tracker pauses to wait for the processor core 102. Branch judgment. Assume that the branch transfer occurs at this time, that is, the TAKEN signal 113 is '1'.
  • the TAKEN signal (' 1 ') controls the selector of the tracker 120 118
  • the branch target BN1 (' 35 ') originating from bus 423 is selected such that BN1 on read pointer 115 is updated to '35' in the next clock cycle.
  • the tracker 120 read pointer 115 is updated to '35', which is the third line of the instruction type table. 5 list items.
  • the value of the read pointer 115 is addressed to the L1 cache 104 via buses 411 and 413 to read the corresponding instruction (i.e., the branch target instruction) via the bus 103 for the processor core 102. carried out. Since the value of the register 422 in this clock cycle is '1', BN1Y (' 5 ') on the read pointer 115 is sent to the offset address mapping module 416 via the bus 413.
  • the conversion yields the column number MBNY as '1', which is sent via bus 419 to the selector 418 of the secondary tracker 420, at which time the value '1' of the register 422 controls the selector 418.
  • the column number '1' from the bus 419 is selected as the output; the value of the register 424 is '1', and the control register 432 is updated to the column number '1' output by the selector 418, so that the column pointer 425 points to the first entry in the row (ie, row 3) of the target track table pointed to by bus 411, and reads the contents of the entry '1U00 ', that is, the corresponding instruction is an unconditional branch instruction, and the branch target instruction has been stored in the level 1 cache 104, and the corresponding target address BN1 is '00'.
  • the BN1 goes through the bus 423 It is sent to the selector 118 in the tracker 120.
  • the memory 426 addresses the output instruction type ('0) according to the BN1Y value ('5') of the read pointer 115 on the bus 413. ') is sent to register 424 via bus 421 for temporary use for the next cycle. And because no branch transfer occurs, the TAKEN signal ('0') is sent to register 422. Temporary storage for the next cycle.
  • the types of instructions read out on the bus 421 for the following three cycles are all non-branch types, and the controller accordingly increases the BN1Y of the tracker pointer 115 by '1' per week. ', while the secondary tracker 420 does not update.
  • the controller judges that the instruction is a branch instruction, and the branch is unconditionally branched from 423 and the address format is BN1 type, that is, the control selector 118 selects the bus 423.
  • the branch target '00' is stored in register 112.
  • the '1' in the 3 rows and 8 columns is also sent by the bus 421 to the register 424 for temporary storage.
  • the unconditional branch type also produces a '1 'Send to register 422 for temporary storage.
  • the pointer of the next cycle tracker points to the 0 line 0 item in the instruction type table 410, thereby starting execution, and controlling the level 1 buffer 104 to send the corresponding instruction for the processor core.
  • the selector 418 selects the mapping result (in this case '0') of the bus 419 as in the previous example and stores it in the register. 432 Pointer 425 to column 0. Subsequent operations can be deduced by analogy to run the instructions correctly to implement the functionality described herein.
  • FIG. 4 can be modified as shown in FIG. 1B to make FIG. 4
  • the embodiment does not need to spend one cycle at the end track point after running to the track point of the last instruction corresponding to each track.
  • the third of the foregoing distinguishing end track point methods is used.
  • the last instruction of the instruction block is a non-branch instruction
  • the ending track point is merged into the track point corresponding to the instruction. Please refer to the map 7. It is another embodiment of the cache system of the present invention.
  • This embodiment is similar to that of Fig. 4, and only some of the modules are shown.
  • the processor core 102 and the level 1 cache 104 The offset address mapping module 416, the memory 426, and the secondary tracker 420 are the same as the corresponding components in FIG. The difference is that the track table 110 The contents of the two track points are output each time, that is, the instruction type table 410 outputs the tracker 120 through the buses 421 and 429, respectively. The contents of the pointed entry and the contents of one of the subsequent entries, and the destination address table 412 are sent out of the read pointer 115 via the buses 423 and 427, respectively.
  • BN1Y is offset address mapping module 416 The content of the table item pointed to by the converted MBNY and the content of one of the following items.
  • Type decoder 752, controller 754, and selector 116 are added to tracker 120 accordingly. .
  • processor core 102 also sends signal 161 to controller 754 in tracker 120.
  • the output of the incrementer 114 is no longer sent directly to the selector 118, but to the selector 116.
  • the read port of the target address table 412 is outputted by the column pointer 425 output by the secondary tracker 420, and outputs the contents of the corresponding two entries (the current entry and the next entry) and placed on the buses 423 and 427.
  • the destination address BN in the contents of the current entry output by the destination address table 412 via the bus 423 is sent to the selector 118, and the branch type information is sent to the controller 754.
  • Destination address table The destination address BN in the content of the next entry output via the bus 427 is sent to the selector 116, and the branch type information is sent to the controller 752.
  • the column address 413 of the memory 426 in the read pointer 115 output by the tracker 120 Under addressing, the contents of the two entries (the current table entry and the next entry) are output on the buses 421 and 429, respectively.
  • the basic type information in the current entry output by the memory 426 via the bus 421 is in addition to the figure.
  • the fourth embodiment is sent to the register 424 and sent to the type decoder 752 and the controller 754.
  • Memory 426 via bus 429 The basic type information in the output of the next entry is also sent to the type decoder 752 and the controller 754.
  • Type Decoder 752 Pair Instruction Type Table 410 The basic type information of the current entry sent and the basic type information of the next entry are decoded, and the result control selector 116 selects the output of the incrementer 114 and the output of the bus 427 and sends it to the selector. 118.
  • BN2 in the target address table 412 is converted to the method described in the previous embodiment when it is used.
  • BN1 therefore, for convenience of explanation, in the present embodiment, it can be considered that the target addresses on the buses 423 and 427 output from the target address table 412 are both BN1.
  • the type decoder 752 controls the selector. 116 Selects the output of the incrementer 114, and the controller 754 controls the selector 118 to select the output of the selector 116 such that BN1X in the next clock cycle register 112 remains unchanged, BN1Y is incremented by one, the read pointer points to the next entry sequentially executed in instruction type table 410, and the next instruction is read from level 1 cache 104 for processor core 102. carried out. For ease of description in this embodiment, it is assumed that the valid signal 111 from the processor core is always active.
  • the type decoder 152 controls the selector.
  • 116 Selects the unconditional branch target BN1 on bus 427, and controller 754 controls selector 118 based on TAKEN signal 113.
  • the selector 118 selects the branch target BN1 on the bus 423 so that the next clock cycle tracker 120 reads the pointer 115
  • the branch target BN1 is updated to the current branch instruction and the branch target instruction is read from the level 1 cache 104 for execution by the processor core 102.
  • selector 118 selects source selector 116
  • the output i.e., the branch target BN1 of the unconditional branch instruction
  • the branch target instruction is read out in 104 for execution by processor core 102.
  • the type decoder 752 controls the selector 116 to select the incrementer 114. Output.
  • the controller 754 controls the read pointer of the tracker 120 based on the basic type information on the current instruction bus 421 and the branch type information and address format on the bus 423. The direction of operation. If the current instruction type is a non-branch instruction, the type decoder 752 controls the selector 116 to select the output of the incrementer 114, and the controller 754 controls the selector 118 to select the selector.
  • the output of 116 is such that BN1X in the next clock cycle register 112 remains unchanged, BN1Y is incremented by one, and the read pointer points to the next entry sequentially executed in the instruction type table 410, and is cached from the first level.
  • the next instruction is read in 104 for execution by processor core 102.
  • the controller 754 controls the selector 118 based on the TAKEN signal 113. If the TAKEN signal 113 is '1 ', indicating that the branch transfer of the branch instruction corresponding to the current entry occurs, the selector 118 selects the branch target BN1 on the bus 423 so that the next clock cycle tracker 120 reads the pointer 115 The branch target BN1 is updated to the branch instruction and the branch target instruction is read from the level 1 cache 104 for execution by the processor core 102.
  • selector 118 selects the source from the incrementer 114 The output is such that BN1X in the next clock cycle register 112 remains unchanged, BN1Y is incremented by one, and the next instruction is read from the L1 cache 104 for execution by the processor core 102.
  • the branch type information on is an unconditional branch instruction
  • the controller 754 controls the selector 118 to select the branch target BN1 on the bus 423 so that the next clock cycle tracker 120 reads the pointer 115
  • the branch target BN1 is updated to the branch instruction and the branch target instruction is read from the level 1 cache 104 for execution by the processor core 102.
  • FIG. 8 is another embodiment of the tracker of the present invention.
  • the tracker 120 of the present embodiment adds a selector 162 as the tracker of Fig. 1C adds.
  • the controller 854 The instructions and intermediate results of the guess execution are cleared, and the selector 166 is selected to place the pointer table address stored in the register 164 on the pointer 115 to cause the level 1 buffer 104.
  • the first instruction of the other branch of the branch is provided for execution by processor core 102.
  • BNX1 in pointer 115 also points to instruction type table 410 via bus 411
  • the corresponding track of the other branch of the branch is read into the offset address mapping module 416, and the BNY1 value in the pointer 115 is 413.
  • the column address MBNY of the first branch instruction in the target address table 412 after the first branch instruction of the above branch is generated is sent to the slave tracker 420 via the bus 419.
  • Simulant controller 854 sends a control signal for selecting the branch to the selector in the secondary tracker 420. After one week delay of the register, the control selector 418 selects the bus 419; and the controller 854 or logic The 824 sends the clock enable signal, and the MBNY value on bus 419 is stored in register 432 after one week delay of the register. Bus 411 and pointer 425 match the target address table The corresponding entry in 412, the branch type and the track table address are sent to the controller 854 and the selector 818 via the bus 423 for use. The next entry is sent to the selector via bus 427. 116 Thereafter, controller 854 controls the direction of the tracker to cause the cache system to processor core 102 in accordance with the type of instruction read from memory 426 and the type of branch on bus 423. Provide appropriate instructions for execution.
  • Another implementation is as in the secondary tracker 420 as in the tracker 120
  • the selector and register are added in the same way to store the MBNY value of the first branch target on another track different from the guess direction, and the two input selector 418 Replaced by a three-input selector.
  • the output of this register is connected to the input of the three-input selector.
  • the three-input selector in the secondary tracker 420 can be generated by the controller 854.
  • the same control signal is delayed after one week of control; however, controller 854 still needs to generate a clock enable signal to or logic 824.
  • enable register 432 When guessing an error, enable register 432 to store the column address that originally existed in the newly added register.
  • the pipeline in the processor core is divided into front ends according to the position of the pipeline segment that determines the branch transition.
  • Front end ) pipeline and back end ) Pipeline The back-end pipeline includes a pipeline segment starting from the pipeline segment that determines the branch to the end of the pipeline, and the front-end pipeline includes a pipeline segment from the first-stage pipeline segment to the branch judgment.
  • the processor core includes two front-end pipelines and a back-end pipeline selected by the selector, it is not necessary to wait for the branch judgment result, and the two front-end pipelines respectively execute the subsequent instruction of the branch instruction sequence address and the branch target instruction. And its subsequent instructions.
  • the branch judgment result is generated, the intermediate result of the selection of one of the two front-end pipelines is controlled to be transmitted to the back-end pipeline to continue execution according to the result, thereby avoiding performance loss due to branch branch prediction error.
  • FIG. 9 is an embodiment of a cache and processor system without branch loss according to the present invention.
  • the processor core 1102 in Figure 9 It consists of two front-end pipelines and one back-end pipeline.
  • the two front-end pipelines are respectively fall-through corresponding to the branch points.
  • an instruction read buffer 1104 and a corresponding row of track tables 1110 are added to Figure 9. And a selector 1108.
  • the instruction read buffer 1104 can be sized to accommodate one instruction block in the level 1 cache 104 for storing the current instruction block being executed by the processor core 1102; the track table 1110 stores the track corresponding to the instruction read buffer 1104. In all embodiments of the present invention, for ease of explanation, it is assumed that the instruction read buffer has a delay of '0. ', the read buffer can be read in the week of the week. In addition, in addition to the tracker 120, another tracker 1120 is added. For convenience of description, the tracker 120 will be used in this embodiment. Called the current tracker, the tracker 1120 is called the target tracker. Controller 1140 is also added to coordinate the operation of the two trackers and control the read cache 1104 and track table 1110 Write.
  • the structure of the trackers 120 and 1120 is the same, and Figure 7
  • the tracker in the embodiment is similar except that the corresponding two input selectors are replaced with three input selectors respectively and the internal details are omitted for ease of understanding.
  • the selector 118 in the tracker 120 The increased input is connected to the output 1123 of the incrementer 1114 in the tracker 1120; the input of the selector 1118 in the tracker 1120 is added to the output bus of the track table 1110.
  • Bus 1117 provides the entire track to fill track table 1110 (only BNX is used for addressing); it can also provide a track point (one entry) in the track for tracker 1120 Use.
  • the bus 103 can provide the entire block of instructions to fill the instruction buffer 1104 (which is also addressed only by BNX); an instruction in the block can also be provided for execution by the processor core 1102.
  • the step signal 111 provided by the processor core 1102 controls the stepping of the current tracker 120 and proceeds to the controller 1140.
  • the controller 1140 monitors the branch decision 113 provided by the processor core 1102, the target front end step signal 1111, and the bus 117, 1117 from the track table. The instruction type and address format on the top coordinate and control the whole system operation. Controller 1140 executes an unconditional branch instruction or conditional branch instruction at processor core 1102 and the branch decision signal of processor core 113 When the branch is successful, the selector 118 in the current tracker 120 is controlled to select the bus 1123, the output of the incrementer 1114 in the target tracker 1120, to be stored in the register 112. , update the read pointer 115 .
  • the controller 1140 controls the selector in the current tracker 120 while monitoring the read pointer 115 to determine that the processor core 1102 is executing the last instruction in an instruction block.
  • Select bus 117 which is the address of the next sequential instruction block provided by the last entry in track table 1110 (actually the end track point of the line, see Figure 7 for the specific operation mode)
  • the embodiment, which will not be described herein, is stored in the register 112 to update the read pointer 115.
  • the controller 1140 controls the selector 118 to select the incrementer 114.
  • the output of the read pointer 115 is stepped under the control of the step signal 111.
  • the controller 1140 When the controller 1140 detects an entry representing a branch instruction on the bus 117, the control target tracker 1120 The selector 1118 in the selection selects the branch target on bus 117 to store in register 1112 and update read pointer 1115.
  • the controller 1140 monitors the bus 1115 When the last instruction of an instruction block is found being sent to the target front-end pipeline in the processor core, the controller controls the selector 1118 to select the contents of the bus 1117 to be stored in the register. 1115 (actually the end track point of the line, the specific operation mode is shown in Figure 7 embodiment, and will not be described here). In addition to the above, the controller 1140 controls the selector 1118 to select the incrementer.
  • the output of 1114 causes the read pointer 1115 to be stepped under the control of the target front end step signal 1111.
  • Controller 1140 is on track table 1110 via bus 117
  • the control selector 1108 selects the BNX411 of the current read pointer 115; the rest of the time selects the target read pointer.
  • Pointer 115 controls read buffering during stepping 1104 provides instructions to the sequential front-end pipeline in processor core 1102 for execution.
  • Pointer 1115 The control level 1 instruction buffer 104 provides instructions to the target front end pipeline in the processor core 1102 for execution during stepping.
  • the tracker 120 reads the BNY in the pointer 115 via the bus 413 to the track table 1110.
  • the instruction read buffer 1104 is addressed, the corresponding track point contents and instructions are read, and the instructions are provided to the sequential front end pipeline in the processor core 1102 via bus 1103.
  • Tracker 1120 read pointer
  • the BNX in 1115 is selected by the selector 1108 via bus 1141 and then to the track table 110 and the level 1 cache via bus 1109. Addressing, find the corresponding track and instruction block; at the same time, tracker 1120 reads BNY in pointer 1115 via bus 1143
  • the track and the instruction block are addressed, the corresponding track point content and instructions are read, and the instructions are provided via bus 103 to the target front end pipeline in processor core 1102.
  • the track table shown in Figure 10 is composed of a table, and is not divided into an instruction type table and a target address table, but the content thereof is substantially the same as the instruction type table 410 and the target address table 412 in the embodiment of FIG. Contains the same content.
  • the current tracker 120 according to the tracker in the previous example, is based on the step signal 111 sent from the processor core 1102, that is, the controller 1140.
  • the read pointer 115 is updated according to the instruction type, i.e., the branch judgment 113 of the processor core.
  • the controller 1140 controls the selector 118 in the current tracker 120 to select the bus.
  • the track point content derived from the track table 1110 i.e., the branch target track point BN of the unconditional branch
  • the track point content derived from the track table 1110 is updated as an output read pointer 115.
  • Read BNX via pointer in pointer 115 After being selected by the selector 1108, the first level cache 104 and the track table 110 are addressed via the bus 1109, and the corresponding target instruction block and target track are read out, respectively, through the bus 103 and 1117 is stored in instruction read buffer 1104 and track table 1110.
  • the BNY in the read pointer 115 is read by the bus 413 to the instruction buffer 1104 and the track table 1110.
  • the track table exists in the first row of the track table 110 in FIG. For example, in 1110.
  • the type of instruction in the read entry is non-branch, and the type control selector 118 selects the output of the incrementer 114 to step the read pointer 115 to the second track point 1130 in the track table 1110.
  • the corresponding instruction (which is a branch instruction) corresponding to the track point 1130 in the read buffer 1104 is controlled by the BNY bus 413 to be sent to the processor core 1102 via the bus 1103.
  • the order of the front-end pipeline is executed.
  • the controller 1140 decodes and determines that the branch type is a conditional branch, and the address format is BN1. .
  • the controller 1140 controls the selector 118 to select the output of the incrementer 114; the selector 1118 selects the branch target BN on the bus 117. .
  • the pointer 115 is updated to point to the third point after the track point 1130 in the track table 1110.
  • the track point is sent to the sequential front-end pipeline after the corresponding branch instruction; and the pointer 1115 is updated to point to the branch source 1130 in the track table 110.
  • the branch target track point (in this case, 0 line first track point '01 '), and points to the instruction cache corresponding to the branch target in the first level instruction cache 104 to the target front end pipeline.
  • the current tracker 120 reads the pointer 115 and the target tracker 1120 reads the pointer 1115.
  • the corresponding branch instruction reaches the first segment of the back-end pipeline, and a branch decision is generated. Assuming the depth of the front-end pipeline is N, then the processor core 1102
  • Each of the subsequent instructions following the branch instruction in the current instruction block and the subsequent instructions from the branch target instruction in the branch target instruction block have been separately processed.
  • branch branching of branch point 1130 does not occur, so processor core 1102
  • the output of the back-end pipeline selection sequence front-end pipeline continues to execute (ie, the intermediate execution result of the target front-end pipeline is discarded); the controller 1140 controls the selector 118 in the current tracker 120.
  • the output of the selection incrementer 114 continues to be stepped under the control of the step signal 111; the target tracker 1120 stops the stepping.
  • the current tracker 120 continues to step, it is stored in the read buffer.
  • the current instruction block in 1104 continues to provide subsequent instructions to processor core 1102 for execution.
  • the current tracker 120 read pointer 115 points to the branch point 1132 (the first line in the track table 110)
  • the controller 1140 reads out that the content is '1C35', indicating that the branch point 1132 is a conditional direct branch, and the branch target track point BN is '35', that is, the third The fifth track point is taken.
  • the track table 1110 outputs the target track point BN according to BN1Y on the bus 413.
  • the controller 1140 controls the BN ' via the bus 117 as before. 35 ' is sent to the target tracker 1120 to store the register 1112 to update the read pointer 1115; and the tracker 120 selects the output of the incrementer 114 to make the read pointer 115 Stepping.
  • the read pointer 115 of the tracker 120 and the read pointer of the tracker 1120. 1115 has separately controlled the instruction read buffer 1104 and the level 1 cache 104 to output N instructions to the sequential front-end pipeline and the target front-end pipeline of the processor core 1102.
  • branch branching of branch point 1132 occurs successfully, so processor core 1102
  • the output of the back-end pipeline in the target front-end pipeline continues to execute (that is, the execution result of the sequential front-end pipeline is discarded).
  • Selector 118 in current tracker 120 selects bus 1123
  • the BNX from the target tracker 1120 read pointer 1115 and the incremented BNY of the incrementer 1114 are stored as BN in register 112 to update the read pointer.
  • the BN points to the N+1 instruction starting from the branch target, which is the instruction that the processor core 1102 should execute next.
  • the target tracker 1120 reads the pointer 1115
  • the BNX reads the corresponding instruction block in the first level cache 104 and the track table 110 through the bus 1109 (ie, the branch target starts the N+1).
  • the instruction block in which the bar is located and the corresponding track are stored in the instruction read buffer 1104 and track 1110 via buses 103 and 1117.
  • the current tracker 120 The read pointer has been updated to point to the branch target to start the N+1th instruction, and the instruction block in which the instruction is located has been filled in the instruction read buffer 1104, and the corresponding track has been filled in the track table 1110.
  • Medium
  • the instruction block in which the N+1th instruction is started along the branch target continues to be passed from the read buffer 1104 via the bus 103.
  • Subsequent instructions are provided to the sequential front end pipeline to the processor core 1102 for execution.
  • the backend pipeline in processor 1102 is selecting the target front-end pipeline N
  • the sequential front-end pipeline is selected after one clock cycle, enabling uninterrupted execution between the Nth instruction from the target front-end pipeline and the N+1th instruction from the sequential front-end pipeline. This achieves a lossless branch.
  • the target tracker 1120 when the target tracker 1120 reads the pointer 1115, it steps to the track table 110.
  • the end track point or the unconditional branch point of the medium target track is updated, it may be updated to the next track of the target track or the unconditional branch target track as described in the previous example.
  • the current tracker 120 reads the pointer 115, it steps to the track table.
  • the corresponding next track or unconditional branch target track is read out.
  • the first solution is to stop the current tracker 120 from reading the pointer 115 and wait for the processor core.
  • 1102 Generates a branch transfer result. If the branch transfer does not occur, then the target tracker 1120 is no longer stepped and the selector 1108 selects BNX on bus 411. (and BNX of the next track of the current track or BNX of the unconditional branch target track)
  • Control track table 110 Output the next track or unconditional branch target track via bus 11125 to track table 1110 Medium, so that the current tracker 120 can continue to step.
  • the current tracker 120 read pointer 115 is updated to the branch target corresponding to the N+1th instruction as described in the previous example. BN, and fills the instruction block and the corresponding track where the instruction is located into the instruction read buffer 1104 and the track table 1110, respectively, so that the current tracker 120 can continue to step.
  • the second solution is to pause the step of the target tracker 1120 to read the pointer 1115.
  • selector 1108 Select BNX on bus 411 (and BNX for the next track of the current track or BNX for the unconditional branch target track)
  • Control track table 110 Via bus 11125
  • the output of the next track or unconditional branch target track is stored in the track table 1110 so that the current tracker 120 can continue to step.
  • the target tracker 1120 reads the pointer 1115. Then you can continue to step. Subsequent operations are as described in the previous example.
  • the third solution is to Figure 9.
  • the structure of the embodiment is improved by adding an instruction read buffer and a corresponding track table for storing the target instruction block and the corresponding track.
  • current tracker 120 and target tracker 1120 Both the instruction buffer and the track are addressed to the respective instructions without accessing the level 1 cache 104 and the track table 110.
  • the level 1 cache 104 And the track table 110 can output the corresponding instruction block and track padding into the corresponding instruction read buffer and the corresponding track table.
  • Figure 11 is the solution in Figure 8 In the example, the embodiment on the track table is separated. This solution to a single track table can be deduced by analogy and will not be described again.
  • FIG. 11 is another embodiment of a cache system supporting a lossless branch according to the present invention.
  • the level 1 cache 104, the track table 110, the offset address mapping module 416, the memory 426, and the corresponding components in the embodiment of FIG. 8 are the same, and the tracker 120 and the sub-tracker 910 are also the same as FIG.
  • the intermediate tracker 120 and the secondary tracker 420 have the same function in the embodiment.
  • tracker 120 is the current tracker and secondary tracker 920 is the current tracker, and for clarity, Figure 11 Only some of the modules in the target tracker 120 are shown, while the register 164, the selector 162, the controllers 752 and 854 are omitted, and the logic 824 is omitted;
  • the two-input selector 162 is replaced by a three-input selector 118 to accept the BN address sent from the target tracker when a branch transfer occurs.
  • Registers 422, 424 in the secondary tracker 910 Moved outside of 920 and included in controller 1260.
  • Controller 1260 accepts TAKEN signal 113 from processor core 1102, read pointers 115 and 1115 And the inputs of buses 421, 921, and 423 control the operation of all four trackers and sub-trackers.
  • Processor core 1102 and instruction read buffer 1104 are shown in Figure 9. The corresponding components in the embodiment are identical, with instruction read buffer 1104 for storing the current instruction block, corresponding to the type of instruction in memory 426.
  • An instruction read buffer 1204 is added to Figure 11 to store the target instruction block. Accordingly, a memory 1226 has been added. It is used to store the instruction type corresponding to the target instruction block, and a target tracker 1120 and a current secondary tracker 1220 are added. Among them, the structure of the target tracker 1120 and the tracker in the example of FIG. 120 is the same, the target secondary tracker 1220 has the same structure as the current secondary tracker 910. In addition, there are selectors 1108, 1210 and selector 1208 in Figure 11. . The selector 1108 selects BNX in the read pointers of the two trackers, and its output controls the level one instruction buffer 104 to read buffer 1104 or read buffer 1204.
  • the instruction block is filled in, also pointing to a row in the track table 110 from which the entire row in the instruction type table 410 is read into the offset address mapping module 416.
  • Selector 1210 Used to select BNY in the read pointers of the two trackers, the output of which is sent to the offset address mapping module 416 and the table 410 read by the corresponding BNX therein. The content works together to generate the next branch point address MBN1Y signal 419 .
  • a selector 1208 is used to select the read pointers of the two secondary trackers, the output of which is used to be from the target address table 412
  • An entry is read in a row pointed to by BNX, and the branch type and address format in the entry are used by the controller to determine the direction of the tracker and to fill the instruction block into the level 1 buffer when the address format is BN2.
  • the selectors 1108 and 1210 are controlled by the same signal generated by the controller 1260, and the selector 1208 is controlled by the controller 1260. The resulting signal is controlled (for clarity, these signals are not shown in the figure).
  • selectors 1108, 1210, and 1208 Match the read pointer of the tracker and the secondary tracker.
  • the delay of the instruction read buffer is '0', that is, the read buffer can be read as the week of the week.
  • the step signal 111 sent by the current tracker 120 at the processor core 1102 is 111.
  • the control instruction read buffer 1104 outputs the corresponding instruction to the sequential front-end pipeline of the processor core 1102 via the bus 1103 for execution; and the read pointer 115
  • the BN1Y portion 413 is mapped to MBN1Y in the offset address mapper 416 and stored in the current secondary tracker 910.
  • the secondary tracker 910 takes the MBN1Y value in the target address table.
  • the first branch instruction starting from the instruction pointed to by 413 is read from the line pointed to by the read pointer 115 (may be 413 The branch target of the instruction pointed to.
  • branch target 1 The branch target, hereinafter referred to as branch target 1, is stored in the target tracker 1120, and the read pointer 1115 is placed, in which BN1X1411 Control reads from the first level instruction buffer 104 to the target instruction read buffer 1204 and reads the corresponding instruction from the bus 1255 by reading the pointer 1115 of the BN1Y portion 1143.
  • the target front-end pipeline destined for processor core 1102 is executed; the BN1Y portion 1143 of read pointer 1115 is mapped to MBN1Y in offset address mapper 416. Save to target trailing tracker 1220.
  • the read pointer 115 always supplies instructions to the sequential front-end pipeline, and the read pointer 1115 also steps until it fills the target front-end pipeline.
  • Controller 1260 It accepts the feedback from the processor core and the instructions, branch types and address formats read out in advance from the track table to coordinate and control the whole system operation.
  • the step signal 111 provided by the processor core 1102 controls the current tracker 120 The stepping and providing the pipeline state to the controller 1260.
  • Controller 1260 monitors the types of instructions on bus 421, 921 and 423 The branch type and address format on the top control the operation of each tracker and control the storage of the read buffer and the corresponding instruction type memory and the offset operation of the offset address mapping module.
  • Controller 1260 is on processor core 1102 When an unconditional branch instruction or a conditional branch instruction is executed and the branch decision signal 113 of the processor core is a branch, the selector 118 in the current tracker 120 is controlled to select the bus 1123, that is, the target tracker. The output of the incrementer 1114 in 1120 is updated to register 112.
  • the controller controls the selector in the current tracker 120 when the processor core 1102 executes the last instruction in an instruction block. 118 selects bus 423, that is, the address of the next sequential instruction block provided by target address memory 412 at this time to update register 112. .
  • the next clock cycle controller in the above two cases controls the selector 918 in the current secondary tracker 910 to select MBN1Y on the bus 419 from the offset address mapping module 416.
  • the controller controls the selector 118 to select the output of the incrementer 114 so that the read pointer 115 is in the step signal. Stepping under control of 111; and controlling selector 918 to select the output of increment 914 when the value on bus 421 is '1' 'Time (indicating that the instruction sent to the processor core at this time is a branch instruction), the step causes the current secondary tracker read pointer 425 to point to the next column in the target address memory.
  • the controller 1260 monitors a new branch target on the bus 423 (the monitoring mode can be the monitoring bus 423)
  • the change above or the read request address or read address of the target address memory 412 controls the selector 1118 in the target tracker 1120 to select the bus 423.
  • the upper branch target updates the register 1112 and controls the target secondary tracker 1220 after one clock cycle.
  • the selector 1218 selects the bus from the offset address mapping module 416.
  • the upper MBN1Y (the column address of the target address memory 412) updates the register 1212.
  • the controller 1260 monitors the bus 1115
  • the controller controls selectors 1108 and 1208 to select BN1X in the target read pointer 1115.
  • the BN1Y portion of the read pointer on the portion 1411 and the bus 1215 is the last entry of the row in the address read destination address table 412 (the address of the next sequential address data block) is placed on the bus 423.
  • the selector 1118 in the target tracker 1120 to select the branch target address update register 1112 on bus 423; and to control the target secondary tracker 1220 in the next clock cycle
  • the controller controls the selector 1218 to select the output of the incrementer 1114 so that the read pointer 1115 Stepping at the target front end stepping signal (which is not shown in Figure 11 for clarity) and controlling the selector 1218 to select the output of the incrementer 1214 when the value on bus 423 is '
  • the step causes the target sub-tracker read pointer 1215 to point to the next column in the target address memory.
  • the read pointer 115 reads a '1' from the memory 426 during the stepping process, it indicates that the corresponding instruction is a branch instruction.
  • the following is called the branch instruction 1 .
  • the current tracker pointer is also stepped along the current track and controls the instruction read buffer.
  • 1104 Outputs the corresponding command to the processor core via bus 1103.
  • the sequential front-end pipeline of 1102 is implemented.
  • the output of memory 426 is placed on bus 421 to control the current secondary tracker 910 stepping, and its read pointer 425 controls the slave target address table 412. Read the branch destination address of the next branch instruction, which is called branch target 2 below.
  • the pointer 115 Continue to step along the current track, continue to provide instructions to the sequential front-end pipeline; the target front-end pipeline is cleared, branch target 2 is filled into the target tracker 1120, and thereafter read pointer 1115 as before Control provides an instruction from branch target 2 to the target front-end pipeline in preparation for the next branch point pointed to by pointer 115. If the read pointer 1115 reads a '1 from the memory 1226 during the stepping process ', indicating that the corresponding instruction is a branch instruction, the bus 921 controls the target sub-tracker 1220 to step, and records the corresponding next branch instruction in the destination address table address MBN1Y to be used.
  • Branch instruction The branch of 1 is judged to be a branch, and the next step value of the pointer read 1115 is filled in the pointer 115 of the current tracker 120, and the contents of the target read buffer 1204 are stored in the current read buffer 1104. This in turn provides instructions to the sequential front-end pipeline.
  • Processor core 1102 also switches instructions from the target front-end pipeline for N clock cycles, then switches back to executing instructions from the sequential front-end pipeline.
  • the controller 1260 The tracker is controlled to operate in conjunction with the corresponding secondary tracker as described in the embodiment of FIG.
  • the following uses the current tracker 120 as an example to illustrate the target tracker 1120.
  • the end track point or unconditional branch point operates in a similar manner to the current tracker but the instructions it provides are sent to the target front end pipeline in processor core 1102 for execution.
  • Read pointer at current tracker 120 115 Prior to reaching a branch point, the read pointer 425 of the current secondary tracker 910 has passed the pointer 1223 to the target address table 412 in the track table 110 as in the previous example. The corresponding entry is read and the contents of the entry are read on the bus 423.
  • the controller 1260 determines that the branch point is an unconditional branch according to the branch type in the content of the entry, waiting for the bus 421.
  • the instruction corresponding to this branch point has been sent to the processor core 1102 for notification.
  • controller 1260 controls selectors in current tracker 120.
  • the branch target track point BN of the unconditional branch on the bus 423 is selected as an output update read pointer 115.
  • Read BNX in pointer 115 via bus 411 by selector 1108 After selection, the L1 cache 104 and the track table 110 are addressed via the bus 1109, and the corresponding target instruction block is read from the L1 cache 104 via the bus 103 and the selector 1250.
  • the memory is stored in the instruction read buffer 1104, and the corresponding row bus 417 is read from the instruction type table 410 of the track table 110, and the selector 1252 is stored in the memory 426.
  • the sum is sent to the offset address mapping module 416.
  • BNY in read pointer 115 addresses instruction read buffer 1104 and memory 426 via bus 413, from instruction read buffer 1104. Reading out the target instruction in the stored instruction block via the bus 1103 to the sequential front-end pipeline of the processor core 1102 for execution, and from the memory 426
  • the instruction type in which the target track point in the target track is read is sent to the current secondary tracker 910 via the bus 421, thus completing the unconditional branch transfer.
  • the current tracker 120 continues to step, and the current secondary tracker 910 is based on the signal when the branch occurs.
  • the read pointer 425 is updated, and the read pointer 425 is selected by the selector 1208 and passed through the bus 1223 to the target address table 412. Addressing, pointing to the first branch point starting from the unconditional branch target, reading the contents of the corresponding entry (ie, branch target BN), and sending the branch target BN output to the current tracker via bus 423
  • the operation of the 120 and target tracker 1120 and the controller is as described in the previous example. Further, the target tracker 1120 updates the read pointer 1115 based on the above-described branch target BN.
  • the read pointer 1115 The BNX 1251 is now selected by the selector 1108 to be sent to the primary read buffer 104.
  • the read target block is filled into the target read buffer 1204 via the bus 103.
  • reading pointer 1115 BNY 1253 selects the target instruction from target read buffer 1204 to be sent to the target front-end pipeline in processor core 1102 via bus 1255.
  • the target tracker 1120 Stepping until a total of N instructions from the branch target are filled into the front-end pipeline, processor core 1102 provides a feedback signal to inform tracker 1120 to stop stepping.
  • the stop signal can also be provided by a counter, but the step signal from the processor core 1102 is still needed to notify the target tracker 1120 of the state of the processor core, such as the stop pipeline. .
  • the selector 118 in the current tracker 120 selects the incrementer 114.
  • the output update pointer 115 causes the pointer 115 to continue reading instructions from the current read buffer 1104 to the sequential front-end pipeline in the processor core 1102. After that, the current tracker 120 reads the pointer.
  • Stepping as in the previous example if the target front-end pipeline is not filled yet, the target tracker 1120 Also at the same time stepping, as described above, provides instructions to the target front-end pipeline.
  • the branch transfer determination is generated until the branch instruction corresponding to the branch point reaches the branch judgment flow stage (the first stage of the back-end pipeline) through the sequential front-end pipeline.
  • the track table shown in Fig. 5 (which is substantially the same as the track table shown in Fig. 10) will be described below as an example.
  • a table with instruction type A row of 1 in row 0 of 0 is the branch of the branch target, and BN1 '10' is stored in register 112 in tracker 120 as in the previous example, so that read pointer 115 is '10'. '.
  • the memory block No. 1 in the first-level instruction buffer 104 is read out via the bus 103 and the selector 1250 and stored in the current read buffer 1104, and the BN1Y in the read pointer 115.
  • the next clock cycle pointer is 115 steps, its value is '11', and the control sequence read buffer 1104 reads out 1
  • the instructions in the number table entry are executed by the sequential front-end pipeline.
  • the No. 1 entry in the memory 426 is read out and placed on the bus 421, and its value is '0'.
  • the controller 1260 The value '0' on signal 419 is latched into the register in sub-tracker 910 based on the branching described above, such that branch target table read pointer 425 is '0', via selector 1208. Selecting the entry 0 in line 1 of the branch target table 412 indicated by BNX via bus 1223, the contents of the entry 1C01 are read.
  • the controller 1260 determines that the entry format is BN1, so there is no need to perform a level 1 instruction cache fill operation.
  • the controller also determines that the branch type is a conditional branch, so the selector 1218 in the control target tracker 1120 selects the entry on the bus 423.
  • the BN1 address '01' is latched in register 1212 in the target tracker.
  • the next clock cycle pointer is 115 steps, its value is '12', and the control sequence read buffer 1104 reads it 2
  • the instructions in the number table entry are executed by the sequential front-end pipeline.
  • the value read from the memory 426 is '1', the bus 421 is placed, and the controller 1260 judges that it is a branch instruction, and is represented by the bus 423.
  • the value of its corresponding instruction, 1C01 is determined to be the conditional branch type, so it waits for the TAKEN branch of the processor core 1102 to determine 113 to determine the direction of the tracker.
  • pointer 1115 is '01' (branch target), selected by selector 1108, controlled by bus 1109 to read the 0th line of level 1 instruction buffer 104, via bus 103
  • the target instruction buffer 1204 is stored; the bus 1109 also controls the 0th line in the read instruction type table 410 to be stored in the offset address mapping module 416 and the memory 1226.
  • pointer The BN1Y part of 1115 is read from the target instruction buffer 1204. The instruction in the No.
  • selector 416 is also selected to map the MBN1Y of the first branch instruction from the branch target to '0' in the offset address mapping module 416, and is sent to the secondary tracker via the bus 419. 1220, at this time controller 1210, selector 1218 selects MBN1Y for register 1212 latch.
  • the secondary tracker 910 is also because the bus 421 is '1 'Stepping under controller 1260, so there is no branch, so controller 1260 controls selector 918 to select the output of increment 914, the output of selector 918 is stored in the register 912 causes read pointer 425 to be '1', selected by selector 1208, to point 1 in the branch target table 412 indicated by BNX via bus 1223. The number entry, the content of the entry 1C35 is read and placed on the bus 423.
  • the controller 1260 determines that it is a conditional branch type, and temporarily stores its type for its corresponding instruction to be executed (for 1 line 6 The instructions in the column are used to control the action of the tracker.
  • the next clock cycle pointer is 115 steps, its value is '13', and the control sequence read buffer 1104 reads its 3
  • the instructions in the number table entry are executed by the sequential front-end pipeline.
  • the pointer 1215 is also stepped, its value is '02', and the control target read buffer 1204 reads out 2
  • the instruction in the number entry is for the target front-end pipeline to execute.
  • Register 1212 in the secondary tracker 1220 latches the output of selector 1218 with MBN1Y whose value is '0'. But at this time the selector 1208 does not select register 1212's output bus 1215.
  • the next clock cycle pointer is 115 steps, its value is '14', and the control sequence read buffer 1104 reads its 4
  • the instructions in the number table entry are executed by the sequential front-end pipeline.
  • the pointer 1115 is also stepped, its value is '03', and the control target read buffer 1204 reads out 3
  • the instruction in the number entry is for the target front-end pipeline to execute.
  • the next clock cycle pointer is 115 steps, its value is '15', and the control reads the current read buffer 1104.
  • the instructions in the number table entry are executed by the sequential front-end pipeline.
  • the pointer 1115 is also stepped, its value is '04', and the control target read buffer 1204 reads out 4
  • the instruction in the number entry is for the target front-end pipeline to execute.
  • processor core 1102 has generated a branch decision, which is not branched and is sent to the controller via TAKEN signal 113.
  • Controller 1260 Control selector 118 selects the output of the incrementer 114. According to the judgment of not branching, the target front-end pipeline is emptied.
  • the next clock cycle pointer is 115 steps, its value is '16', and the control sequence read buffer 1104 reads its 6
  • the instructions in the number table entry are executed by the sequential front-end pipeline. And as in the previous example, read the entry No. 6 in the instruction type memory 426, and put the bus 421 with a value of '1'.
  • Controller 1260 controls the pointer
  • the BN value on 1115 is updated to the BN value '35' on bus 423.
  • the BNX part 411 on pointer 1115 is selected by selector 1108 via the bus 1109
  • Control reads the instruction block No. 3 in the first-level instruction buffer 104, stores it in the target instruction buffer 1204, and controls the reading by the BN1Y part 1143 on the pointer 1115.
  • Controller 1260 It is judged to be an unconditional branch type, waiting for its corresponding instruction to be executed (in fact, the eighth line in the first line of the instruction type table 410 in Fig. 5 The address of the next sequential instruction line of the column, as described by the unconditional branch instruction, is used to control the action of the tracker.
  • the next clock cycle pointer is 115 steps, its value is '17', and the control sequence read buffer 1104 reads it 7
  • the instructions in the number table entry are executed by the sequential front-end pipeline.
  • the pointer 1115 is also stepped, its value is '36', and the control target read buffer 1204 reads out 6
  • the instruction in the number entry is for the target front-end pipeline to execute. Since the pointer 115 has reached the last instruction in the block, the controller 1260 acts according to the current value '1U30' on the bus 423. BN '30' reads the preparation of the level one instruction cache 104.
  • the next clock cycle pointer 115 value is '30', as in the previous example, the third instruction in the first instruction read cache 104
  • the instruction blocks are stored in the sequential read buffer 1104 and the instructions in the No. 0 entry are read out for execution by the sequential front-end pipeline.
  • Pointer 1215 is also stepped, with a value of '37', controlling target read buffer 1204 Read the instructions in its No. 7 entry for the target front-end pipeline to execute. Since pointer 1115 has reached the last instruction in the block, controller 1260 acts as the current value based on bus 423' BN '00' in 1U00' reads the preparation of the level one instruction cache 104.
  • Controller 1260 Control Selector 1108 Select Bus 1115 Put Bus 1109 Pointing to the third row in the track table 110, the control selector 1208 selects the read pointer 1215 and sends it to the target address table 412 via the bus 1223 to read its first item '1U00. 'Put the bus 423.
  • the controller 1260 reads the last instruction in an instruction block according to the instruction being executed, and reads out the bus 921 from the corresponding seventh item in the memory 1226.
  • the type of instruction sent is a non-branch instruction, so it is determined that '1U00' does not actually correspond to the real instruction in the program, but an end unconditional branch. Therefore, do not wait for the delivery from the bus 921 with '1U00 'The corresponding type of instruction is '1', which determines the branch.
  • the next clock cycle pointer 115 has a value of '31', which controls the current read buffer 1204 to read it 1
  • the instructions in the number table entry are executed by the sequential front-end pipeline.
  • Pointer 1115 stores the 0th instruction block in the first instruction read buffer 104 into the target read buffer 1204 for the '00' control and reads it.
  • the instruction in the number entry is for the target front-end pipeline to execute.
  • processor core 1102 has generated a branch decision to send to controller 1260 via signal 113 as a branch.
  • Controller 1260 Based on this determination, the selector 118 selects the value '01' on the bus 1123 (i.e., the value of the BN1Y increment on the target read pointer 1115). Controller 1260 Control selector 1252 is also selected to select the output of memory 1226. According to the branch's judgment, the sequential front-end pipeline is emptied.
  • the next clock cycle is based on the determination of the branch, and the controller 1260 controls the content of the target instruction buffer 1204 to be passed through the selector.
  • the 1250 writes to the current instruction buffer 1104 and also controls writing the output of the selector 1252 to the memory 426.
  • Register 112 latches the output of selector 118 to make the read pointer 115
  • the value is '01', which controls the current read buffer 1104 to read the instructions in its No. 1 entry for sequential front-end pipeline execution.
  • the instruction already in the target instruction pipeline continues to execute, but the controller controls the target tracker 1120 does not cause target buffer 1204 to continue to provide more instructions to the target front-end pipeline in the processor core.
  • processor core 1102 selects instructions from the target front-end pipeline for N clock cycles.
  • the next clock cycle pointer 115 has a value of '02', and the control target read buffer 1204 reads out 2
  • the instructions in the number table entry are executed by the sequential front-end pipeline.
  • Register 912 in the secondary tracker 910 stores the '0' value lock on bus 419 on the read pointer 425, via the selector 1208 bus.
  • 1223 reads the contents of the entry 2C83 from the 0 column in the 0 row pointed to by pointer 115 in the destination address table 412.
  • the controller determines that the entry format is BN2 and its BN2 value is ' 83 ' is sent to the active list via bus 423, so that the corresponding instruction block is filled in the level 1 cache, and the BN1 of the level 1 cache is filled into the target address table 412.
  • the BN1 is taken from the bus 423 Read out for use with the tracker. By doing so, you can achieve a lossless branch operation.
  • processor core 1102 The output of the back-end pipeline selection sequence front-end pipeline continues to execute (ie, the execution result of the target front-end pipeline is discarded); the current tracker 120 continues to step; the target tracker 1120 Stop stepping. Thus, as the current tracker 120 continues to step, subsequent instructions are continued to be provided to the processor core 1102 along the current instruction block. If the branch transfer succeeds, the processor core 1102 The output of the back-end pipeline in the target front-end pipeline continues to execute (that is, the execution result of the sequential front-end pipeline is discarded).
  • Selector 118 in current tracker 120 selects bus 1123
  • the BNX from the target tracker 1120 read pointer 1115 and the incremented BNY of the incrementer 1114 are stored as BN in register 112 to update the read pointer. 115.
  • the BN points to the N+1 instruction starting from the branch target, which is the instruction that the processor core 1102 should execute next.
  • the instruction read buffer 1204 The instruction block in is stored in instruction read buffer 1104, and the entire contents of memory 1226 are also stored in memory 426, causing current tracker 120
  • the read pointer is updated to point to the branch target to start the N+1th instruction, and the instruction block in which the instruction is located has been filled in the instruction read buffer 1104, and the corresponding instruction type has been filled in the memory 426. Medium.
  • the instruction block in which the N+1th instruction begins along the branch target continues to provide subsequent instructions to the processor core 1102 for execution.
  • the instruction read buffer and the capacity of the corresponding instruction type memory are further increased, so that a certain number of instruction blocks can be saved in the instruction read buffer, and a corresponding number of instruction types are saved in the memory.
  • the tracker 120 or 1120 When the read pointer is updated to a track other than the current track and the target track, it can be checked whether the track and the corresponding instruction block are also stored in the memory and the instruction read buffer.
  • An implementation manner thereof may be storing each corresponding instruction block in an instruction read buffer or a corresponding instruction type memory. BN1X address.
  • the BN1X values in the read pointers 115 and 1115 are first sent to the read buffer and each BN1X stored therein. Match. If there is storage in the memory and the instruction read buffer (i.e., there is a match between BN1X), the number of accesses to the level 1 cache 104 and the track table 110 can be reduced, and power consumption can be reduced.
  • the first solution is to add a register to the current tracker 120 for storing the branch target BN. And making a prediction that the branch branch does not occur for the latter conditional branch, so that the current tracker 120 Continue to step along the track. If the branch transfer of the previous conditional branch does not occur and the prediction for the latter conditional branch is correct, the current tracker 120 Continue to step along the track. If the branch transfer of the previous conditional branch does not occur, but for the prediction error of the latter conditional branch, the current tracker 120 reads the pointer 115. The value of the branch is restored before the branch target BN is stored in the register, and the sequential front-end pipeline is cleared, so that the current tracker 120 Stepping back from the branch target track point of the latter conditional branch.
  • the current tracker 120 read pointer is updated as in the previous example. And stepping from the N+1th track point starting from the branch target of the previous conditional branch.
  • the instruction read buffer 1104 and the level 1 cache are in the current tracker 120 and the target tracker 1120. The process of providing instructions to processor core 1102 under control is the same as in the previous example.
  • the second solution is to add more front-end pipelines to the processor core.
  • the read buffer and the instruction type memory can realize the instructions on the four branch paths corresponding to the two-layer conditional branch at the same time.
  • the specific method can be analogized according to the previous embodiment, and will not be described here. There may be any other suitable modifications in accordance with the technical solutions and concepts of the present invention. All such substitutions, modifications and improvements are intended to be within the scope of the appended claims.
  • FIG. 12 is another embodiment of the cache system of the present invention.
  • This embodiment adds a data cache based on the embodiment of FIG. 1A. Only the lower half of the description of the present embodiment in FIG. 1A is shown, wherein the processor core 102, the L1 cache 104, the track table 110, and the tracker 120 are the same as the modules numbered in the embodiment of FIG. 1A.
  • the upper part of FIG. 1A is not shown for convenience of explanation, but it should be understood that there are modules such as the active table 130 and the block address mapping module 134 in FIG. 1A in FIG.
  • the operation is similar.
  • a data buffer 1426, a data engine 1428 and a selector 1424 are newly added.
  • the input to selector 1424 is bus 123, 125 from block address mapping module 134 shown in FIG. 1A. Or bus 1425 from data engine 1428.
  • the track table 110 stores not only information related to the branch instruction but also information related to the data access instruction. Please refer to FIG. 13, which is the format of the track table entry in Embodiment 12, and 1300 is the basic format, where 1303 is an instruction type, 1305 is a block (X) address, and 1307 is an intra-block offset address.
  • the instruction type field 1303 includes the foregoing address type (BN2 or BN1), the branch type (direct or indirect, conditional or unconditional), and the like;
  • the 1305 and 1307 fields respectively contain an instruction block address and an offset within the instruction block, and may be in the form of an instruction cache address BN2 or a BN1 format.
  • the state field 1311, 1312 contains the state of the entry (described in more detail below); the 1313 domain contains two data for two consecutive data accesses of the corresponding instruction.
  • the difference between the addresses hereinafter referred to as the Stride;
  • multiple data access operations such as load, store
  • the B-type identifier 1321 indicates the corresponding branch instruction of the entry.
  • the L-type identifier 1323 indicates the data access instruction corresponding to the entry. If the same table contains an entry corresponding to the branch and the data access instruction, the identifier is required to be distinguished. If there are entries that do not correspond to the above two types of instructions, they can be marked with an N-type identifier.
  • the read pointer 115 controls the level one instruction cache 104 to read an instruction to the processor core 102 for execution while reading the corresponding entry in the track table 110.
  • the controller (not shown) controls the selector 118 to select the output of the incrementer 114.
  • the next cycle executes the next instruction in the sequence.
  • the instruction type identifier of the entry read on the bus 117 is of the B type, the operation thereof has been described in the embodiment of FIG. 1A and will not be described herein.
  • the controller accesses data buffer 1426 at the data address on 117 and exchanges data with processor core 102 at 1426.
  • data engine 1428 adds the data address on step 117 to the step size value and stores it back to the original read list entry in track table 110 via selector 1424.
  • Control selector 118 is also selected to select the output of incrementer 114 at this time.
  • the next instruction in the next clock cycle is executed.
  • the data address can be in DBN format and consists of DBNX and DBNY.
  • DBNX is the line number of the data level buffer 1426
  • DBNY is the intra-block offset of the data in a row indicated by DBNX.
  • DBN data buffer address
  • BN instruction buffer address
  • the action mechanism in this embodiment is to access an L-type entry in the track table 110 through the instruction buffer address BN, read the DBN data buffer address in the entry, and access the data in FIG. 12 by using the DBN as an address.
  • Level 1 buffer 1426 That is, the address of the track table 110 is the instruction buffer address BN, and the contents of the entry contain the data buffer address DBN, so the track table 110 maps the instruction address to the data address. The mapping relationship is variable.
  • the data engine 1428 in FIG. 12 adds the data address of the L-type entry of the track table 110 to the primary data buffer 1416.
  • the Stride value is stored as the new data address back to the entry in 110.
  • FIG. 14 is another embodiment of the cache system of the present invention.
  • This embodiment adds a data cache based on the embodiment of Figures 1A and 4.
  • the scanner 106, the secondary cache 108, the active list 130, the block address mapping module 134, and the selectors 132, 142, 148, 150 are the same as the modules of the same number in the embodiment of FIG. 1A; the instruction level cache 104, the track table 110, Processor core 102, tracker 120, target address table 412, offset address mapping module 416, memory 426
  • the same numbered modules as in the embodiment of Fig. 4 are the same.
  • the sub-tracker 428 has a modified structure compared to the sub-tracker 420 of FIG. 4, and there is no register 424 in 428. Instead, the output of the 421 bus and the register 422 is controlled by the OR gate 1462 to update the register 432, that is, The output of 421 or register 422 is '1' and register 432 is updated.
  • the two-input selector 146 of FIG. 1A is replaced by a three-input selector 1420 with a new input from the data engine 1428.
  • a data lower block address memory 1422, a data level one buffer 1426, and a data engine 1428 are added.
  • the data cache block address DBNX of the upper and next data blocks of the data block is stored in the entry of the data lower block address memory 1422.
  • a bidirectional bus 1427 is added to transfer data between the secondary buffer 108 and the primary data buffer 1426; a new bidirectional bus 1429 is added to transfer data between the primary data buffer 1426 and the processor core 102.
  • the new modules and buses are designed to support data buffer operations. Also in Figure 1A selectors 312 and 150 have an input from output 117 of track table 110, which in Figure 12 is from output 423 of branch target table 412. 423 in FIG. 4 is logically equivalent to 117 in FIG. 1A.
  • the contents of the instruction type table 410 and the branch target address table 412 in the track table 110 are changed, not only the related information of the branch instruction but also the data access instruction (such as loading Load, storing the store, etc.). information.
  • Storage operations can be written by buffer (Write Buffer) suspends access to the data buffer, and the load operation cannot be delayed.
  • the advance of the load data address is more conducive to reducing the processor's waiting time. Therefore, the following description is taken as an example in which both the instruction type row and the branch target address table 412 in the instruction type table 410 store only the branch instruction and the load instruction related information, but the present embodiment can be applied to all the loading operations and/or the storage operations. Please refer to FIG.
  • 1501 is an instruction block in the first-level instruction buffer 104, and the instructions are arranged from left to right in ascending order of offset addresses within the block.
  • L represents a load instruction
  • B represents a branch instruction
  • N represents an instruction that is neither loaded nor branched.
  • 1503 is a row corresponding to the instruction block in the instruction type table 410, wherein the entry of '1' represents a B-type or L-type instruction, and the value of '0' represents an N-type instruction.
  • One of the 1503 rightmost entries is the ending track point as described above. Represented by 'E'.
  • L indicates that the entry stores the information of the load instruction
  • B indicates that the entry stores the information of the branch instruction. In this example, there is no N-type entry.
  • 1503 and 1505 it is an abstract extraction and compression of instruction information in the 1501 instruction block.
  • the operation of the system only needs the information of some instructions in the 1501 instruction block, which is required for operation (Operation Necessary)
  • the information in the instruction is extracted and stored in the branch target table 412 in the order of 1505 in the order of the address of the part of the instruction, and the information of other instructions that do not need to be operated is not saved; That is, only the information of the 'operation required' instruction is arranged in the order of the instructions.
  • the 1503 format stores the intra-block offset address of all instructions in the instruction block.
  • the 'Operation Required' instruction (ie, the part of the instruction that stores information in 1505) is distinguished by a different identifier from the instruction of the identifier and 'operation not required'; all identifiers are arranged in the order of the instructions.
  • the corresponding information of the instruction of 1505 can be sought by the identifier of 1503 according to the intra-block offset address of an instruction under the appropriate processing logic.
  • the processing logic is essentially the sequence number MBNY of the 'operational need' identifier corresponding to the offset address in 1503 within the block, since the required information is actually ranked in the sequence number of the 'operational needs' instruction in 1505.
  • the numbers above 1501, 1503, and 1505 in Fig. 15A are the intra-block offset addresses of the corresponding entries.
  • the intra-block offset addresses 1, 2, 4, and 6 on 1505 are not continuous, because only the corresponding instructions of these offset addresses are 'operational needs'.
  • the number below the 1305 is the sequence number MBNY of the instruction corresponding to each entry, wherein the end track point is stored in the entry of the sequence number '4', because there are four B-type and L-type instructions in the instruction segment 1501.
  • the corresponding information occupies the entry of sequence number '0'-'3' in 1505.
  • the instruction related to the operation in the 1503 format has an identifier of '1', and the identifier of the irrelevant instruction is represented by '0'.
  • the identifier of the offset address 1, 3, 4, and 6 entries is '1', and the identifier of the other entries is '0'.
  • the branch target table 412 of FIG. 14 is a compressed table. There are lines in the 1505 format. However, the addressing of the entries in each row of the table is addressed by the sequence number of the 'Operation Required' instruction. The value on the branch target table read pointer 425 driven by the secondary tracker 428 in this example is the sequence number. The operation of the system will be described below with reference to Figs. 14 and 15A.
  • the processor system is executing an instruction segment whose track table information has been stored in memory 426.
  • the read pointer 115 reads a branch (type B) instruction from the level one instruction buffer 104 and pushes it to the processor core 102 via the bus 103 for execution.
  • the intra-block offset address portion 413 in 115 reads the corresponding entry '1' from the memory 426 and places it on the bus 421.
  • the branch target of the branch instruction outputted by the branch target table 412 (pointing to the instruction having the offset address '2' in the 1301 instruction block in FIG. 15A) is sent via the bus 423 to an input of the selector 118 in the main tracker 120. .
  • the offset address portion 413 of the read pointer 115 reads the branch instruction correspondence table '1' from the memory 426 and puts it on the bus 421. According to the 421 value of '1', and the BN address format and the conditional branch instruction type on 423, the controller (not shown) waits for the processor core to make a branch judgment.
  • the processor core 102 executes the above-described branch instruction, judges to be 'execution branch', and controls the selector 118 to select the bus 423 via the branch decision 113.
  • the second clock cycle the value on 113 is also stored in register 422; register 112 is updated and the branch target address on bus 423 is placed on read pointer 115.
  • the read pointer 115 reads from the first level instruction buffer 104 that the N type instruction having the offset address '2' in the 1301 instruction block is pushed to the processor core 102 for execution. At this time, the read pointer 115 points to the 1505 line in the branch target table 412 in the track table 110, and also points to the 1503 line in the instruction type table 410 because the branch is judged as 'execution' because the output of the register 422 is '1', so 1503
  • the rows are also stored in memory 426.
  • the intra-block offset mapping module 416 maps the value '2' of the offset address 413 to the instruction sequence number '1' via the bus 419 to one input of the selector 420 in the secondary tracker 420 for use by the 1503 line of information.
  • the value in register 422 controls selector 418 to select bus 419.
  • the offset address 413 also reads the entry '0' of the N-type instruction in the memory 426 and puts it on the bus 421.
  • the controller (not shown in Figure 15) controls the selector 118 to select the output of the incrementer 114 based on the value '0' on 421.
  • the BN1X read pointer 411 output from the register 112 is unchanged, and the BN1Y read pointer 413 is incremented by '1' with an offset of '3'.
  • 413 reads an L-type instruction having an offset of '3' in 1301 from the first-level instruction read buffer 104, and pushes it to the processor core 102 for execution.
  • the value of the register 422 is '1'
  • the register 432 is updated by the OR gate 1462
  • the branch target read pointer 425 reads the 1505 line sequence number of the branch target table 412 in the instruction sequence number '1' as '1' (the offset address is The entry of '3') is sent from the bus 423.
  • the offset address read pointer 413 also reads the entry '1' of the L-type instruction in the memory 426 and places it on the bus 421 for transmission.
  • the controller (not shown in Figure 15) reads the identifier on bus 423 based on the value '1' on 421. Based on the L-type identifier, the controller controls the selector 118 to select the output of the incrementer 114; also accesses the primary data buffer 1426 with the DBN on 423, and the read data is pushed to the processor core 102 via the bus 1429. At this time, the data engine 1428 adds the DBNY stored in the previous cycle to Stride.
  • the current DBNX from 423, the DBN is formed as the next DBN via the bus 1425, and the selector 1424 is stored in the entry of the current DBN in the branch target table 412 (here The second entry starts on the left side of 1305). If the sum exceeds the size of one data block but falls within the adjacent data block, the DBNX address of the adjacent data block from the lower block address memory 1422 is DBNX, and the remaining size of the overflow instruction block is discarded in the sum. Part of the DBNY constitutes the next DBN.
  • the bus 1425 is also stored in the branch target table 412 by the selector 1424.
  • the current DBN entry in the branch target table 412 (here, the second entry in the left of 1305). If the sum exceeds the address space of the adjacent instruction block, the processor core 102 is now waiting to calculate that the data address of the L-type instruction is sent to the active table 130 via the bus 155 for matching, and the result of the matching is via the bus 133 to the block address mapping module 134. Map DBNX in the middle (the process is the same as that described in Figure 1A, and is not described here).
  • the DBNX uses the above-mentioned sum in the data engine to discard the remaining portion of the overflow instruction block size as DBNY to form the table of the current DBN stored in the branch target table 412 by the next DBN (here, the table with the sequence number '2' in 1505) item).
  • the difference between the next DBN and the current DBN is also stored as the step Stride in the same entry.
  • the primary data buffer 1426 is accessed with the DBN in the corresponding entry in the branch target table 412 as the address, and the data is pushed to the processor core 102 in advance.
  • the read pointer 413 is placed on the bus 421 with the value of '3' shifted from the memory 426 to '3'.
  • the 421 value of '1' causes the selector 118 in the tracker 120 to be controlled by the identifier on the bus 423, the type of instruction, and the instruction 113.
  • the rules are as follows. When the identifier on 423 is L, then selector 118 selects the output of incrementer 114. When the label identifier on 423 is B, and the address type on 423 is BN2, then BN2 is sent to the block address mapping module 134 via the buses 423 and 133 to be mapped into the same entry of the BN1 address back to the branch track table 412.
  • the process is as described in the example of FIG. 1A, and details are not described herein again.
  • the selector 118 selects the BN1 address on 423.
  • the selector 118 is controlled by the instruction judgment 113, and if 113 is the 'execution branch', the BN1 address on the 423 is selected. If 113 is 'no branch', the output of the incrementer 114 is selected.
  • the value of 113 is stored in register 422 for the fourth clock cycle.
  • the L-type instruction with the offset address '3' described above is decoded in the processor core 102, causing 102 to read the data that has been pushed onto the bus 1429 and load it into the register file in 102.
  • the BN1X read pointer 411 outputted by the register 112 is unchanged, the BN1Y read pointer 413 is incremented by '1', and the offset address is '4'.
  • 413 reads out the B-type instruction having the offset of '4' in 1501 from the first-level instruction read buffer 104, and sends it to the processor core 102.
  • the controller determines that it is B-type, so instead of accessing the primary data buffer 1426 without the value of 423, the BN value on 423 is sent to an input of the selector 118, and waits The value of 421 is used to control the selector 118 in the tracker 120.
  • the BN1Y read pointer 413 has a value of '4', and the entry '1' having the offset of '4' in 1503 is read from the memory 426 and placed on the bus 421.
  • the identifier is B type
  • the address type is BN1
  • the instruction type is a conditional branch instruction
  • the controller waits for the branch decision 113 to control the selector 118 according to the rule to determine the program flow direction. If the value of the branch decision 113 is 'execution branch', the operation process is as described above in this embodiment, and will not be described again.
  • the control selector 118 selects the output of the incrementer 114.
  • the 113 value '0' is sent to the register 422 for storage; the BN1X read pointer 411 outputted by the register 112 is unchanged, and the BN1Y read pointer 413 is incremented by '1' with an offset of '5'.
  • 413 reads out the N-type instruction having the offset of '5' in 1301 from the first-level instruction read buffer 104, and pushes it to the processor core 102 for execution.
  • the value of the register 422 is '0'
  • the control selector 418 selects the output of the incrementer 414
  • the value of the bus 421 is '1'
  • the period register 432 is updated
  • the branch target read pointer 425 is read by the instruction sequence number '3'.
  • An entry of 1505 line sequence number '3' (offset address is '6') in branch target table 412 is sent from bus 423.
  • the offset address read pointer 413 also reads the entry '1' of the L-type instruction in the memory 426 and puts it on the bus 421.
  • the controller reads the identifier on bus 423 based on the value '1' on 421.
  • the controller controls the selector 118 to select the output of the incrementer 114; also accesses the primary data buffer 1426 with the DBN on 423, and the read data is pushed to the processor core 102 via the bus 1429.
  • the following operations are the same as those in the foregoing embodiment, and are not described again.
  • the scanner 106 scans the instruction block filled in the first-level instruction cache 106 and fills the track table for the corresponding track, if the scanned instruction is an L-type instruction, then the BNX of the L instruction is pressed, and the BNY address is in the instruction.
  • the entry in the type table 410 is written to '1', and the D bit and the S bit are "invalid" in the entry of the branch target address table 412 with the BNX and the counter address of the control write as the column address.
  • the controller waits for the processor core 102 to send the data memory address via the bus 155 according to the 'invalid' S bit on the bus 423, as described above via the active table 130 and the block address mapping module 134.
  • the data is mapped to the DBN addressing primary data buffer 1426 for reading to the processor core 102; the DBN is also stored in the entry originally read by the branch target address table 412, and the D bit in the entry is set to be valid.
  • the controller waits for the processor core 102 to send the data memory address via the bus 155 according to the 'invalid' S bit on the bus 423, as described above for the DBN addressing primary data buffer.
  • the DBN addressing primary data buffer 1426 reads the data and provides it to the processor core 102.
  • the data may be first sent to the processor core 102 by using the DBN on the 423, and the data processing is performed by the DBN.
  • the data memory address is mapped to DBN and the DBN on the 423. If the same, the processor core 102 directly uses the provided data. Data; if different, the newly generated DBN accesses the primary data buffer 1426 to re-read the data, and stores the DBN in the original read list entry and sets the D position of the entry to 'valid, S position is ' invalid'.
  • the secondary buffer 108 is a hybrid cache (Unified Cache) stores instruction blocks and data blocks, and provides instruction blocks to the level one instruction cache 104 and the scanner 106 via the bus 107, and also exchanges data with the level one data buffer 1426 via the bus 1427.
  • the block address mapping module 134 also stores the block address of the instruction cache block in the form of BN, and also stores the block address of the data cache block in the form of DBN for mapping.
  • the present embodiment requires a data address calculation unit to calculate the data address according to the value in the register file for the first and second executions of the load instruction in the instruction loop (provided by the processor core 102 in this embodiment).
  • the subsequent data addresses can be automatically generated by the cache system of the present invention (including track table 110, tracker 120, 420, etc.) to push data to processor core 102.
  • the store operation can also be automatically provided by the cache system of the present invention, and the processor core only needs to provide the data read from the register file.
  • the cache system of the present invention can also push instructions to the processor core. Except that the indirect branch instruction requires an instruction address calculation unit to calculate the indirect branch target address based on the value in the register file, other instructions can be automatically directed by the cache system disclosed by the present invention.
  • the processor core 102 pushes.
  • the processor core 102 only needs to provide the branching decision 103 to the cache system to control the program direction, and the step signal 111 to notify the buffer system processor of the pipeline state (pause push) two kinds of feedback
  • FIG. 15B is another instruction type table, a branch target table, and a data address table format to support the embodiment in FIG. 16.
  • FIG. 16 is another embodiment of the cache system of the present invention. This embodiment is based on the embodiment of Fig. 14, with the addition of a data address table to specifically store the data access address.
  • the instruction level cache 104, the processor core 102, the tracker 120, the sub-tracker 420, the target address table 412, the offset address mapping module 416, the memory 426, the block address mapping module 134, the data lower block address memory 1422, data level buffer 1426, data engine 1428 is the same as in FIG.
  • the selector 146 is the same as in Fig. 1A. The difference is that in the embodiment of Fig. 15, the track table 150 replaces the track table 110 of Fig. 14.
  • a data address table 1412 is added to the track table 150 to store information about the load instruction, and the branch target table 412 stores only the relevant information of the branch instruction as shown in FIG. Similar to 412, the data address table 1412 is also a compressed table to save storage space, so the instruction type table 410 in FIG. 12 is replaced by the instruction type table 1410; the branch types representing the B type and the L type in 1410 are separately stored, different The branch types of type B and type L of 1503 in Fig. 15A are mixed and stored.
  • the data access operation also requires its dedicated data side tracker 1428, memory 1426, data access address mapping module 1416 so that the BNY in the read pointer 115 can be correctly mapped to the data access address pointer 1425 pointing to the data address table corresponding to the BNY. 1412 entry.
  • the 1426 has the same structure as the address mapping module 416, which is responsible for the instruction address mapping, and has the same function.
  • an OR gate 1630 is added so that not only the '1' value on the bus 1421 (representing the execution of a data instruction), the register 1432 in the 1420 can be updated to Pointing to the next data access instruction, and the branch decision (eg, 113, etc.) may also cause register 1432 to be updated to store the instruction sequence number MBNY of the first data access instruction in the branch target instruction and subsequent instructions.
  • the selector 1424 selects the DBN from the block address mapping module 134 or the data engine 1428 for storage in the data address table 1412.
  • the data read from the primary data buffer 1624 is stored in the first in first out 1636 via the bus 1633 for the processor core 102 to read. Data written back from processor core 102 is buffered by write buffer 1638 and written back to primary data buffer 1624 via bus 1639.
  • the data address table 1412 is similar to the organization of the target address table 412.
  • the number of columns is greater than or equal to the maximum number of data access branch instructions that may exist in a level one instruction block, and each row corresponds to a data access instruction in the corresponding level one instruction block.
  • the order of appearance stores the corresponding data access instruction S bits, D bits, step size, and DBN in the format of 1318 in FIG. 13 in the respective entries from left to right.
  • the branch instruction information and the load instruction information are now stored in 412 and 1412, so the table entry formats in both tables do not require B or L identifiers 1321 and 1323. Please refer to FIG.
  • the instruction type table 1410 is selected by the BNX411 in the read pointer 115 and has two lines.
  • One line (hereinafter referred to as an instruction type line) 1513 is an instruction type dedicated to the branch instruction, (and the line in the instruction type table 410 in the example of FIG. 4). The roles are identical), corresponding to row 1315 in branch instruction table 412.
  • the row is read and sent to an offset address mapping module 416, which controls the tracker 120 and the secondary tracker 420, as in the embodiment of FIG.
  • a new row (hereinafter referred to as data type row) 1523 in the instruction type table 1410 entry is set for the data access instruction in the same instruction segment. It can be seen that the entry with the offset address of '4' in 1513 is '1', corresponding to the B-type instruction with the offset address being '4' in the 1501 instruction segment.
  • the branch target address information of the instruction is stored in the entry of the 1515 form in the branch target table 412 with the sequence number '0'; the other entries in 1513 are '0'.
  • the 1515 sequence entry with the sequence number '1' stores the end track point, which is the block number of the next instruction block.
  • 1523 is the data line of the instruction type table 1410, which also corresponds to the instruction segment of 1501. It can be seen that the offset address in '15' is '1', the entry of '3', '6' is '1', corresponding to the L-type instruction in the 1501 instruction segment; and the data address table 1412 also has the data address entry of the 1525 format.
  • the data from the left three entries respectively store the data address information of the L-type instruction with the offset address of the block in the 1501 instruction segment being '1', '3', '6'; and the other entries in 1523 are '0'. .
  • the setting and use of the D bit and the S bit are the same as in the embodiment of Fig. 14.
  • the instruction line 1513 in the instruction type table 1410 in the track table 150 in FIG. 15B corresponds to the corresponding 1515 line in the branch target table 412, and the data line 1523 in the 1410 is the same as the 1525 line in the data access address table 1412, by the same read pointer 115. Addressing.
  • the processor system is executing an instruction segment whose track table information has been stored in memories 426 and 1426.
  • the read pointer 115 reads a branch (type B) instruction from the level one instruction buffer 104 and pushes it to the processor core 102 via the bus 103 for execution.
  • the intra-block offset address portion 413 in 115 reads the corresponding entry '1' from the memory 426 and puts it on the bus 421, and also reads the corresponding entry '0' from the memory 1426 and puts it on the bus 1421.
  • the branch target of the branch instruction outputted by the branch target table 412 (pointing to the instruction having the offset address '2' in the 1301 instruction block in FIG. 15A) is sent via the bus 423 to an input of the selector 118 in the main tracker 120.
  • the controller waits for the branch of the processor to determine as shown in FIG.
  • the processor core 102 executes the branch instruction, determines to execute the branch, and controls the selector 118 to select the bus 423 via the branch decision 113.
  • the value of the branch decision 113 is stored in the registers 422 and 1422; the read pointer 115 reads from the first level instruction buffer 104 the N type instruction having the offset address '2' in the 1501 instruction block is pushed to the processor. Core 102 is for execution.
  • the read pointer 115 points to the 1515 line in the branch target table 412 in the track table 110 and the 1525 line in the data address table 1412, and also points to the 1513 and 1523 lines in the instruction type table 410 because the value of the register 422 is '1'. (The branch is judged as 'execution branch'), so line 1513 is stored in the memory 426, Line 1523 is stored in memory 1426.
  • the intra-block offset mapping module 416 maps the value '2' of the offset address 413 to the instruction sequence number '0' via the bus 419 to one input of the selector 420 in the secondary tracker 420 with information of 1513 lines.
  • the intra-block offset mapping module 1416 maps the value '2' of the offset address 413 to the data sequence number '1' via the bus 1419 to one input of the selector 1418 in the data sub-tracker 1420 with information of 1523 lines.
  • the value '1' in registers 422 and 1422 controls selector 418 to select bus 419, and selector 1418 selects bus 1419.
  • the offset address 413 also reads the entry '0' of the N-type instruction in the memory 426 and puts it on the bus 421.
  • the entry '0' of the N-type instruction in the read memory 1426 is placed on the bus 1421.
  • the controller (not shown in Figure 16) controls the selector 118 to select the output of the incrementer 114 based on the value '0' on 421.
  • the BN1X read pointer 411 output from the register 112 is unchanged, and the BN1Y read pointer 413 is incremented by '1' with an offset of '3'.
  • 413 reads an L-type instruction having an offset of '3' in 1301 from the first-level instruction read buffer 104, and pushes it to the processor core 102 for execution.
  • the value of the register 422 is '1', the period register 432 is updated, and the branch target read pointer 425 reads the sequence number 1505 of the branch target table 412 in the branch target table 412 by the instruction sequence number '0' (the offset address is '0' The entry of the ) is sent from the bus 423.
  • the register 1432 is updated because the value of the register 1422 is '1', and the value '1' on the bus 1419 is placed on the data address table.
  • the read pointer 1425 accesses the data target table 1412 to read the DBN from the bus 1423. Thereafter, the offset address read pointer 413 reads the entry '0' of the L-type instruction in the memory 426 and puts it on the bus 421, and reads the entry '1' of the L-type instruction in the memory 1426 onto the bus 1421.
  • the controller controls the selector 118 to select the output of the incrementer 114 according to the value '0' on 421; and also accesses the primary data buffer 1426 by the DBN on 1423 according to the value '1' on the 1421, and reads the data via Bus 1633 and first in first out 1636 are pushed to processor core 102. Thereafter, the operation of the data engine 1428 is the same as that in the embodiment of FIG. 14, and will not be described again.
  • the controller controls the selector 118 to select the output of the incrementer 114. If the 421 bus value is '1', the controller causes the selector 118 in the tracker 120 to be controlled by the instruction type and command determination 113 on the bus 423.
  • the rules are as follows. When the address type of 423 is BN2, then BN2 is sent to the block address mapping module 134 via the buses 423 and 133 to be mapped into the same entry of the BN1 address back to the branch track table 412. The process is as described in the example of FIG. 1A. I will not repeat them here.
  • the selector 118 selects the BN1 address on 423.
  • the address type of 423 is BN1
  • the 423 instruction type is a conditional branch instruction
  • the selector 118 is controlled by the instruction judgment 113. If 113 is 'execution branch', the BN1 address on 423 is selected; if 113 is 'no branch', The output of the incrementer 114 is then selected. At this time, the value on 421 is '0', so the selector 118 selects the output of the incrementer 114.
  • the fourth clock cycle after the L-type instruction with the offset address '3' is decoded in the processor core 102, causes 102 to read the data that has been pushed onto the first-in first-out 1436 and load it into the register in 102. stack.
  • the BN1X read pointer 411 outputted by the register 112 is unchanged, and the BN1Y read pointer 413 is incremented by '1' with an offset of '4'.
  • 413 reads out the type B instruction having the offset of '4' in 1301 from the first level instruction read buffer 104, and sends it to the processor core 102.
  • the value of the register 1422 is '0', and the control selector 1418 selects the output of the incrementer 1414.
  • the value of the bus 1421 is '1', the register 1432 is updated, so the read pointer 1425 is incremented by '1' to '2', and the entry of the sequence number '2' in the 1525 in the read data target table 1412 is controlled.
  • the controller accesses the primary data buffer 1426 with the DBN value on the 1423, and the read data is stored in the first in first out 1636 via the bus 1633.
  • the BN1Y read pointer 413 has a value of '4', and the entry '1' having the offset of '4' in the memory 426 is read from the memory 426 and placed on the bus 421.
  • the address type on the 423 bus is BN1
  • the instruction type is a conditional branch instruction
  • the controller waits for the branch decision 113 to control the selector 118 according to the rule to determine the program flow direction. If the value of the branch judgment 113 is 'execution branch', the other operations of the fifth period are as described in the foregoing second period of the embodiment, and are not described again.
  • the control selector 118 selects the output of the incrementer 114.
  • the 113 value '0' is sent to the registers 422 and 1422 for storage; the BN1X read pointer 411 outputted by the register 112 is unchanged, and the BN1Y read pointer 413 is incremented by '1' with an offset of '5'.
  • 413 reads out the N-type instruction having the offset of '5' in 1301 from the first-level instruction read buffer 104, and pushes it to the processor core 102 for execution.
  • the value of the register 422 is '0', and the control selector 418 selects the output of the incrementer 414, and the bus 1421 value is '1' last week.
  • the cycle register 432 is updated, and the branch target read pointer 425 reads the entry of the 1515 line sequence number '1' in the branch target table 412 by the instruction sequence number '1', that is, the end track point is sent from the bus 423.
  • the offset address read pointer 413 also reads the memory 426, 1426, 1513, 1523, the entry of the N-type instruction with the offset address '5', '0', and the '0' is placed on the bus 421, 1421.
  • the controller controls the selector 118 to select the output of the incrementer 114 based on the value '0' on 421. The following operations are the same as those in the foregoing embodiment, and are not described again.
  • the branch and data access instructions of the embodiment of Figure 16 each have their own separate branch instruction types 1513 and 1523, with separate compressed information tables 412 and 1412, with independent trackers 428 and 1428, with separate offset addresses.
  • the mapping modules 416 and 1416 have separate memories 426 and 1426, so that they can be executed ahead of each other without interference. For example, a load instruction with an offset address of '6' can be simultaneously cached with a branch instruction with an offset address of '4'. Handle to better mask the delay of the data buffer. Vice versa, it is possible to simultaneously process a data access instruction with an offset address and a branch instruction with an offset address to mask the delay of the instruction buffer.
  • the basic structure is that only the instruction information required for buffer processing is extracted to the compression tables 412 and 1412, and the offset addresses of these required instructions are recorded in the instruction type table 410.
  • the basic mode of operation is that the selectors 118, 418, 1418 in the trackers 120, 428, 1428 are controlled by the branch, and the unsuccessful branches (including non-branch) select the incrementers 114, 414, 1414 in each of the trackers.
  • the address mapping module 416, 1416 maps the resulting instruction sequence number.
  • the update of the main tracker 120 read pointer 115 is determined by the processor core pipeline state feedback 111, which is updated every clock cycle while the pipeline is operating normally.
  • the update of the secondary tracker 428, 1428 is then updated by the corresponding instruction type in the instruction type table 410 read from the offset address 413 in the read pointer 115.
  • the in-block offset portion 413 of the read pointer 115 controls the direction of the program from the instruction type 421 of the instruction currently being executed in the memory 426 (may not be controlled by 421, but entirely controlled by the branch decision 113, at which time the processor core 102 causes 113 to display 'execution branch' on any successful branch, including unconditional branches.
  • the 421 value is '1' to inform the secondary tracker 428 that the branch instruction is currently being executed, and 428 can be updated in the next clock cycle, provided Information about a branch instruction.
  • the instruction type 1421 of the instruction currently being executed from the memory 1426 by the intra-block offset portion 413 of the read pointer 115 Controlling data access, such as a 1421 value of '1', informs the secondary tracker 1428 that a data access instruction is currently being executed, and 1428 can be updated in the next clock cycle to provide information for the next data access instruction.
  • the cache system does not need the processor core 102 to assist, prepare the branch target of the next branch instruction in advance or read the data required to read the next data access instruction in advance. As such, it may be sufficient for processor core 102 to provide processor core pipeline state feedback 111 and branch decision 113 when executing most of the instructions.
  • the processor core needs to provide the branch target instruction address or data instruction address only for the first time and the second execution of the indirect branch instruction or a data access instruction.
  • the cache system can push instructions and data to the processor core based only on the pipeline state and branch decisions.
  • the intra-block offset address BNY of the data access instruction of the stored data is added to each entry of the first-in first-out 1636.
  • the register 1432 in the sub-tracker 1420 is updated every time the access of the data buffer 1426 is completed and the corresponding data engine 1428 is operated, until the last data access instruction of the instruction segment (the position of the last instruction needs to be marked).
  • Intra-block offset address BNY of the data access instruction The data read from the data buffer 1426 in conjunction with the instruction is stored in the same entry in the first in first out 1636 for the processor core 102 to read.
  • the offset address of the branch instruction is sent to the first in first out 1636, and the intra-block offset address of the data access instruction is greater than the branch instruction offset address.
  • the entry is emptied.
  • FIG. 17 is a push buffer system supporting multiple transmissions of instructions, taking 4 transmission as an example.
  • the level 1 buffer 1426, the data engine 1428, the selector 146, and the selector 1424 are the same as in FIG. The difference is that there are four sets of instruction decode/execution units in the processor core 1702 connected to a register file.
  • the register file has enough read ports and write ports to support all four sets of execution units to execute one instruction in the same clock cycle.
  • the four sets of execution units are named A, B, C, and D, wherein the A unit executes the first instruction, the B unit executes the second, and the others are analogous.
  • the level 1 cache 1704 can provide 4 consecutive instructions starting from the input address, also named A, B, C, D.
  • Memory 1726 replaces 426 of Figure 16, which replaces 1426 of Figure 16 and also provides four consecutive entries A, B, C, D starting from the input address.
  • the adder 1714 replaces the incrementer 114 in the original 120 with the main tracker 1720.
  • a correlation detector 1702 and maskers 1706 and 1708 have been added.
  • the correlation detector 1702 detects the correlation between the four consecutive instructions sent from the first level instruction cache 1704 to the processor core 1702 via the bus 1701, and generates two control signals according to the detection result; one of them is 1707, Tracer 1720 provides an address increment which is added by adder 1714 to read pointer 115 to obtain the next cycle of read pointers (starting addresses of four consecutive instructions); the other is a 3-bit control line 1703 to control the processor Whether the B, C, D execution units in the core are executed, and the maskers 1706 and 1708 are also masked to mask the B, C, and D entries read from the memories 1726 and 1736 so that the instruction type information of the executable instructions can be The types of instructions used to generate control buses 421 and 1421, but which cannot be executed due to instruction dependencies in this cycle, are masked by mask maskers 1706 and 1708 from A entries in memories 1726 and 1736 and over control lines 1703.
  • the B, C, and D items after the ''and' operation) are 'OR' operated, and as a result, the bus 421 and 1421 control the program direction and control data access, and also control the steps of the sub-tracking devices 420 and 1420.
  • Post-read (RAW) related "and "write-after-write (WAW)" can be related. That is, when the target register of an instruction is the same as the source register number of an instruction subsequent to it, a write-after-read correlation is generated, and the two instructions cannot be executed in parallel; and when both instructions are written to the same target register, the previous one should be discarded. The corresponding value of the instruction, and the value corresponding to the latter instruction is written to the register file.
  • there is only one module for processing data access and one module for processing branch instructions so that only one branch instruction and one data branch instruction are executed, and the second branch instruction is executed. Or the second data access instruction and its subsequent instructions cannot be executed during the week, but are reserved for the next clock cycle.
  • the correlation detector 1702 therefore also detects a plurality of branch instructions and a correlation between the plurality of data access instructions.
  • the correlation detector 1702 simply decodes the operation codes of the four instructions simultaneously transmitted, determines the instruction format of each instruction, and extracts an instruction field related to the instruction correlation according to the instruction format, such as a register number, etc., for four
  • the instruction performs correlation detection, and generates a correlation bit for each of the last three instructions B, C, and D.
  • the detection of the plurality of branch instructions may be provided by the above simple decoding, or by reading the types of instructions in the memories 1726 and 1736.
  • the relevant bit is '1', it means that there is no correlation between the instruction and any previous instruction; if the relevant bit is '0', it means that there is a correlation between the instruction and at least one previous instruction, and the instruction cannot Executed in the same clock cycle as the previous instruction.
  • the B, C, D related bits pass through a priority encoder in 1702 so that the relevant bits of the B instruction can affect the C, D related bits; so that the relevant bits of the C instruction can affect the D related bits; that is, when the B related bit is '0', C, The D correlation bit must be '0'; when the C correlation bit is '0', the D correlation bit must be '0'.
  • the B, C, and D related bits corrected by the priority encoder are control lines 1703 for controlling whether the corresponding B, C, and D execution units in the processor core execute corresponding instructions.
  • the A execution unit always executes the A command. Because the A command cannot be the second one in the relevant instruction.
  • 1703 is also sent to the masks 1706 and 1708 to perform 'AND' operations with the B, C, and D items read from the memories 1726 and 1736, and the 3-bit result is OR'ed with the A item.
  • One of the results is placed on bus 421 and bus 1421.
  • the associated bit on 1703 is also encoded by one of the encoders 1702, and the result is placed as an instruction address increment on control line 1705 and sent to main tracker 1720 to adder 1714.
  • the encoding method is as the decimal number '4' when the 'BCD' bit is '111', and there is no correlation between the four instructions; when the 'BCD' bit is '110', it is programmed as the decimal number '3'. At this time, the D command has a correlation; when the 'BCD' bit is '100', it is programmed as a decimal number '2'. At this time, at least the C command has a correlation; when the 'BCD' bit is '000', it is coded as a binary number. 1 ', at this time at least the B command has a correlation.
  • the encoded result sent by 1705 is the instruction address increment.
  • the 1501 instruction block in Fig. 15B is still taken as an example. Assuming that the block address portion 411 of the read pointer 115 points to the 1501 instruction block in the level one instruction cache 1704, and the in-block offset portion 413 has a value of '0', the buffer 1704 will offset the address '0'-' in the 1501 instruction segment. The 4 instructions 'NLNL' of 3' are pushed to the processor core 1702 via bus 1701 for execution. At this time, the values of the read pointers 425 and 1425 are also '0'.
  • 413 also controls the '0-3' entry '0000' of the branch instruction line 1513 read from the memory 1726, and 413 also controls the '0-3' entry of the branch instruction line 1523 from which the memory 1736 is read. '0101'.
  • These entries are sent to the correlation detector 1702 (not shown in FIG. 17). After detecting by the detector 1702, the two L-type instructions whose offset addresses are '1' and '3' are found to generate B, C. , D related bit '110', and instruction address increment '3'. The associated bit '110' is sent via bus 1703 to processor core 1702 such that the D execution unit does not execute the D instruction (i.e., the L instruction with offset address '3').
  • the related bit '110' is also sent to the masker 1706 via the bus 1703 and the last three bits of the entry '0000' from the memory 1726 to perform an 'AND' operation with the first bit to generate a 421 value of '0'.
  • a 1421 value of '1' is generated.
  • the adder 1714 adds the value '0' on 413 to the address increment '3' on the bus 1707, and the sum is '3'. Since the 421 value is '0' (meaning that there is no branch instruction in the instructions that can be executed in this clock cycle), the controller control selector 118 selects the output '3' of the adder 1714.
  • the 1421 value is '1' (meaning that there is a data access instruction in the instructions that can be executed in this clock cycle), so the controller accesses the data buffer 1426 with the DBN on the output 1423 of the data address table 1412, pushing the data into the first in first out 1633 for the processor.
  • the execution unit (in this case, the B unit) that executes the L instruction in the core 1702 reads.
  • the instruction cache 1704 will be the four instructions 'LBNL of the offset address '3'-'6' in the 1501 instruction segment. 'Push to bus core 1702 via bus 1701 for execution.
  • the register 1432 is updated this week, and the entry of the sequence number "1" in the 1525 line of the data address table 1412 is read.
  • the 'LBNL' four instructions are detected by the correlation detector 1702, resulting in a value of '110' on the bus 1703 and a value of '3' on the bus 1705.
  • the processor core 1702 is controlled by the value '110' on the bus 1703 to control the D execution unit not to execute the D instruction (the L-type instruction with the offset address '6'). Since the 1421 value of this week is '1', the controller accesses the data buffer 1426 by the DBN on the output 1423 of the data address table 1412, and the push data is stored in the first in first out 1633.
  • controller control selector 118 is controlled by branch decision 113.
  • the adder 1714 adds the value '3' on 413 to the address increment '3' on the bus 1707, and the sum is '6'. If the branch judges to be 'no branch', the selector 18 selects the output of the adder 1714, the read clock pointer 413 value is '6' for the next clock cycle, and the instruction buffer 1704 offsets the address '6'-'7 of the 1501 instruction segment.
  • the 'two instructions' LN' are pushed to the processor core 1702 via bus 1701 for execution.
  • the value of the register 432 in the sub-tracker 420 is increased by '1', and the sequence number is read from the branch target table 412 as '1'.
  • the entry (end track point in 1515) is sent to an input of the selector; the last cycle 1421 is '1', and the branch is judged as 'no branch', and the value of register 1432 in the secondary tracker 420 is increased by '1'.
  • Correlation detector 1702 also performs correlation detection on the instructions on bus 1703 as before.
  • the selector 18 selects the bus 423, the value of the read pointer 115 for the next clock cycle is the branch target address, and the instruction cache 1704 passes the four instructions 'LN' starting from the branch target instruction via the bus 1701. Pushed to processor core 1702 for execution.
  • the instruction type of the branch target instruction segment is stored in memory 1726 and 1736 via bus 417.
  • the intra-block offset portion 413 of the read pointer 115 is the intra-block offset address of the branch target address at this time, and is mapped to the MBNY sequence numbers 419 and 1419 in the offset address mappers 416 and 1416.
  • the selector 418 selects 419, since the previous cycle 421 is '1', the register 432 of the secondary tracker 420 stores the value on the 419 bus (branch target instruction and subsequent instructions) The instruction sequence number MBNY) of the first branch instruction in the branch target table 412.
  • the selector 1418 selects 1419, because the previous cycle 1421 is '1', and the register 1432 in the secondary tracker 1420 stores the value on the 1419 bus (branch target instruction and thereafter).
  • the first data access instruction in the instruction is the instruction sequence number MBNY in the data address table 1412. Subsequent operations and so on.
  • Fig. 17 illustrates the correlation between a plurality of data access instructions as an example, but the other types of correlation processing between instructions are also the same.
  • the correlation detector 1702 detects the correlation between the plurality of instructions, determines which instructions can be executed simultaneously and controls the corresponding execution unit in the processor core through the 1703 bus; and also controls the masks 1706 and 1708 through the 1703 bus to be simultaneously.
  • the instruction types of the various instructions executed are used to generate control buses 421, and 1421.
  • Control bus 421 is used to control the direction of the program, and 1421 is used to control data access.
  • the correlation detector 1702 also provides an address increment 1705 corresponding to the number of executable instructions for the primary tracker read pointer 115 to point to the start address of the next clock cycle.
  • the apparatus and method proposed by the present invention can be used in various cache related applications, and the efficiency of the cache can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

La présente invention concerne un système et un procédé de mise en mémoire cache. Lorsque la présente invention est appliquée au domaine des processeurs, une instruction peut être chargée, avant d'être exécutée par un cœur de processeur, dans une mémoire à grande vitesse accessible directement par le cœur de processeur et, sans que le cœur de processeur fournisse l'adresse de l'instruction, la mémoire à grande vitesse peut être commandée de façon à fournir l'instruction au cœur de processeur directement, en fonction d'informations de retour produites par l'exécution de l'instruction du cœur de processeur, de sorte que le cœur de processeur peut obtenir l'instruction requise provenant de la mémoire à grande vitesse presque à chaque fois, ce qui permet d'obtenir un taux de réussite en mémoire cache extrêmement élevé.
PCT/CN2014/094603 2013-12-24 2014-12-23 Système et procédé de mise en mémoire cache WO2015096688A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201310737813.4 2013-12-24
CN201310737813 2013-12-24
CN201410048036.7 2014-01-29
CN201410048036.7A CN104731718A (zh) 2013-12-24 2014-01-29 一种缓存系统和方法

Publications (1)

Publication Number Publication Date
WO2015096688A1 true WO2015096688A1 (fr) 2015-07-02

Family

ID=53455626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094603 WO2015096688A1 (fr) 2013-12-24 2014-12-23 Système et procédé de mise en mémoire cache

Country Status (2)

Country Link
CN (2) CN104731718A (fr)
WO (1) WO2015096688A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9495149B1 (en) 2015-12-18 2016-11-15 International Business Machines Corporation Identifying user managed software modules

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346168B2 (en) * 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US20170046159A1 (en) * 2015-08-14 2017-02-16 Qualcomm Incorporated Power efficient fetch adaptation
WO2017061894A1 (fr) * 2015-10-09 2017-04-13 Huawei Technologies Co., Ltd. Conversion de flux de données en matrices
JP6400770B1 (ja) * 2017-04-03 2018-10-03 株式会社東芝 伝送局
CN107291920B (zh) * 2017-06-28 2021-02-02 南京途牛科技有限公司 一种机票查询缓存方法
CN107729053B (zh) * 2017-10-17 2020-11-27 安徽皖通邮电股份有限公司 一种实现高速缓存表的方法
CN111290698B (zh) * 2018-12-07 2022-05-03 上海寒武纪信息科技有限公司 数据存取方法、数据处理方法、数据存取电路和运算装置
CN109684236A (zh) * 2018-12-25 2019-04-26 广东浪潮大数据研究有限公司 一种数据写缓存控制方法、装置、电子设备和存储介质
CN111625280B (zh) * 2019-02-27 2023-08-04 上海复旦微电子集团股份有限公司 指令控制方法及装置、可读存储介质
CN110187663B (zh) * 2019-06-19 2020-11-03 浙江中控技术股份有限公司 监控方法和装置
CN110851182B (zh) * 2019-10-24 2021-12-03 珠海市杰理科技股份有限公司 指令获取方法、装置、计算机设备和存储介质
CN114780031B (zh) * 2022-04-15 2022-11-11 北京志凌海纳科技有限公司 一种基于单机存储引擎的数据处理方法和装置
CN117193861B (zh) * 2023-11-07 2024-03-15 芯来智融半导体科技(上海)有限公司 指令处理方法、装置、计算机设备和存储介质
CN117971719B (zh) * 2024-03-28 2024-06-28 北京微核芯科技有限公司 一种提前传递数据的方法及其装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020108029A1 (en) * 2001-02-02 2002-08-08 Yuki Kondoh Program counter (PC) relative addressing mode with fast displacement
CN102110058A (zh) * 2009-12-25 2011-06-29 上海芯豪微电子有限公司 一种低缺失率、低缺失惩罚的缓存方法和装置
CN102841865A (zh) * 2011-06-24 2012-12-26 上海芯豪微电子有限公司 高性能缓存系统和方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5913047A (en) * 1997-10-29 1999-06-15 Advanced Micro Devices, Inc. Pairing floating point exchange instruction with another floating point instruction to reduce dispatch latency
JP2008299795A (ja) * 2007-06-04 2008-12-11 Nec Electronics Corp 分岐予測制御装置及びその方法
US20090204791A1 (en) * 2008-02-12 2009-08-13 Luick David A Compound Instruction Group Formation and Execution
US8516230B2 (en) * 2009-12-29 2013-08-20 International Business Machines Corporation SPE software instruction cache
CN102163143B (zh) * 2011-04-28 2013-05-01 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的方法
CN102968293B (zh) * 2012-11-28 2014-12-10 中国人民解放军国防科学技术大学 基于指令队列的程序循环代码动态检测及执行方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020108029A1 (en) * 2001-02-02 2002-08-08 Yuki Kondoh Program counter (PC) relative addressing mode with fast displacement
CN102110058A (zh) * 2009-12-25 2011-06-29 上海芯豪微电子有限公司 一种低缺失率、低缺失惩罚的缓存方法和装置
CN102841865A (zh) * 2011-06-24 2012-12-26 上海芯豪微电子有限公司 高性能缓存系统和方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9495149B1 (en) 2015-12-18 2016-11-15 International Business Machines Corporation Identifying user managed software modules
US9588758B1 (en) 2015-12-18 2017-03-07 International Business Machines Corporation Identifying user managed software modules
US9996340B2 (en) 2015-12-18 2018-06-12 International Business Machines Corporation Identifying user managed software modules
US10013249B2 (en) 2015-12-18 2018-07-03 International Business Machines Corporation Identifying user managed software modules
US10102244B2 (en) 2015-12-18 2018-10-16 International Business Machines Corporation Identifying user managed software modules

Also Published As

Publication number Publication date
CN104731719A (zh) 2015-06-24
CN104731718A (zh) 2015-06-24
CN104731719B (zh) 2020-04-28

Similar Documents

Publication Publication Date Title
WO2015096688A1 (fr) Système et procédé de mise en mémoire cache
WO2014000641A1 (fr) Système de cache à hautes performances et procédé
WO2015078380A1 (fr) Procédé et système de conversion d'un jeu d'instructions
WO2014139466A2 (fr) Système et procédé de mise en cache de données
WO2015024492A1 (fr) Système et méthode de processeur à haute performance basés sur une unité commune
WO2015024493A1 (fr) Système de mise en mémoire tampon et procédé basé sur un cache d'instruction
WO2012175058A1 (fr) Système et procédé de mémoire cache à haute performance
JP3565504B2 (ja) プロセッサにおける分岐予測方法及びプロセッサ
KR100513358B1 (ko) Risc형명령세트및슈퍼스칼라마이크로프로세서
US7159103B2 (en) Zero-overhead loop operation in microprocessor having instruction buffer
US5394530A (en) Arrangement for predicting a branch target address in the second iteration of a short loop
US7162619B2 (en) Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer
US5606682A (en) Data processor with branch target address cache and subroutine return address cache and method of operation
US20020087849A1 (en) Full multiprocessor speculation mechanism in a symmetric multiprocessor (smp) System
CA2297402A1 (fr) Methode et dispositif de reduction du temps de latence des caches a association d'ensembles au moyen de la prediction portant sur les ensembles
WO2015024482A1 (fr) Système de processeur et procédé utilisant un mot d'instruction de longueur variable
JP2000029701A (ja) 単一クロック・サイクルに非連続命令を取り出すための方法およびシステム。
US20070094478A1 (en) Pointer computation method and system for a scalable, programmable circular buffer
US20060149951A1 (en) Method and apparatus for updating global branch history information
Jourdan et al. The effects of mispredicted-path execution on branch prediction structures
Jourdan et al. Recovery requirements of branch prediction storage structures in the presence of mispredicted-path execution
WO2015070771A1 (fr) Système et procédé de mise en antémémoire de données
US6587941B1 (en) Processor with improved history file mechanism for restoring processor state after an exception
JP3802038B2 (ja) 情報処理装置
JP4362096B2 (ja) 情報処理装置,リプレース方法,リプレースプログラム及びリプレースプログラムを記録したコンピュータ読取可能な記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14874754

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14874754

Country of ref document: EP

Kind code of ref document: A1