WO2015070771A1 - Data caching system and method - Google Patents

Data caching system and method Download PDF

Info

Publication number
WO2015070771A1
WO2015070771A1 PCT/CN2014/090972 CN2014090972W WO2015070771A1 WO 2015070771 A1 WO2015070771 A1 WO 2015070771A1 CN 2014090972 W CN2014090972 W CN 2014090972W WO 2015070771 A1 WO2015070771 A1 WO 2015070771A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
group
address
block
memory
Prior art date
Application number
PCT/CN2014/090972
Other languages
French (fr)
Chinese (zh)
Inventor
林正浩
Original Assignee
上海芯豪微电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海芯豪微电子有限公司 filed Critical 上海芯豪微电子有限公司
Publication of WO2015070771A1 publication Critical patent/WO2015070771A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6026Prefetching based on access pattern detection, e.g. stride based prefetch

Definitions

  • the invention relates to the field of computers, communications and integrated circuits.
  • the role of the cache is to copy a portion of the contents of the memory, so that the content can be quickly accessed by the processor core in a short time to ensure the continuous operation of the pipeline.
  • the addressing of the current cache is based on the following manner, first reading the tag in the tag memory with the index segment in the address. At the same time, the contents of the read buffer are addressed by the index segment in the address and the segment within the block. In addition, the tags read in the tag memory are matched to the tag segments in the address. If the tag read from the tag memory is the same as the tag segment in the address, then the content read from the cache is valid, called a cache hit. Otherwise, if the tag read from the tag memory is different from the tag segment in the address, the cache is missing, and the content read from the cache is invalid. For the cascading cache, the above operations are performed in parallel for each way group to detect which way group cache hits. The read content corresponding to the hit path group is valid content. If all the way groups are missing, all readings are invalid. After the cache is missing, the cache control logic populates the contents of the low-level storage medium into the cache.
  • the method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.
  • the invention provides a data caching method, which is characterized in that a data storage in a cache is configured, wherein a part of the storage blocks implements a traditional group associative structure, and another part of the storage blocks realizes a structure allocated by groups;
  • the cache is composed of a plurality of groups, each of which stores a plurality of data blocks corresponding to the same start data block address, and the difference between the data addresses corresponding to the adjacent storage blocks in the group is the same value.
  • the data address corresponding to the data block in each group has the same part; the same part is formed by a label in the data address, or is formed by a part of the label in the data address and a part of the index number; Or similar data blocks are stored in the same group.
  • the data block addresses in all the storage blocks in the group are consecutive; when each adjacent one of the groups
  • the interval of the data block addresses in all the storage blocks in the group is equal; the current data may be directly in the corresponding position and the data step size in the group. Determine if the next data is also in the group and where the next data is located in the group.
  • a sequence table is provided; the rows of the sequence table are in one-to-one correspondence with the groups in the data storage; and each row of the sequence table includes a compression ratio; the compression ratio indicates each of the corresponding groups The interval value of the data block address corresponding to the adjacent memory block.
  • each row of the sequence table includes a location of a group in which the data block adjacent to the data block in the corresponding group is located; and the current data may be directly determined according to the corresponding position and the data step size in the group. The location in which the data resides and the location in the group.
  • each row of the sequence table includes a location of a group of consecutive data blocks adjacent to the first data block in the corresponding group.
  • each row of the sequence table includes a location of a group of consecutive data blocks adjacent to a last data block in the corresponding group.
  • the data address is converted into a cache address;
  • the cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the offset within the block in the data address;
  • the cache address can be used directly to address the data store in the data cache.
  • the data corresponding to the data access instruction in the loop code is stored in a structure allocated by the group, and the data corresponding to the other data access instructions is stored in the group-associated structure.
  • the data access instruction that is executed for the first time is converted into a cache address when its data address is generated.
  • the data access instruction executed for the second time is converted into a cache address when the data address is generated, and the data step size is calculated; the data step size is the difference between the two data addresses; The cache address and the data step size calculate the next possible cache address when the data access instruction is executed next time, and the next time the data access instruction is executed, the data memory is addressed; and when the next buffer address corresponds to the data memory When the data in the data is invalid, the next cache address is converted to the corresponding data address, and the corresponding data is filled into the data memory.
  • the next cache address is calculated according to the current cache address and the data step size, and the next time the data access instruction is executed, the data memory is addressed;
  • the next cache address is converted into a corresponding data address, and the corresponding data is filled into the data memory.
  • the present invention also provides a data caching system, wherein the data storage in the data caching system can operate a part of the storage blocks as a traditional group associative structure according to a configuration, and another part of the storage blocks are allocated as groups.
  • the structure is operated; the structure allocated by the group comprises a plurality of groups, each group comprises a plurality of storage blocks and a data block address storage unit, and all the storage blocks in the group correspond to the data blocks in the data block address storage unit Address; the difference between the data addresses corresponding to each adjacent storage block in each group is the same value.
  • the data cache system further includes a masked comparator, wherein the comparator is configured to match a part of the block address in the data address with a corresponding bit of the data block address in the data block address storage unit, It is determined whether the data corresponding to the data address is stored in the group.
  • data block addresses in all the storage blocks in the group are consecutive; and when the data address corresponds to the data
  • the memory blocks in the group are addressed by the masked bits to find the data corresponding to the data address.
  • the data cache system further includes a shifter; when a difference between data addresses corresponding to each adjacent storage block in a group is equal to an integer multiple of a data block length, in all the storage blocks in the group The intervals of the data block addresses are equal; and when the data corresponding to the data address is stored in the group, the value obtained by the shifter after the masked bit shift is the memory block in the group Addressing, the data corresponding to the data address can be found.
  • a shifter when a difference between data addresses corresponding to each adjacent storage block in a group is equal to an integer multiple of a data block length, in all the storage blocks in the group The intervals of the data block addresses are equal; and when the data corresponding to the data address is stored in the group, the value obtained by the shifter after the masked bit shift is the memory block in the group Addressing, the data corresponding to the data address can be found.
  • the data caching system further includes a sequence table memory; the rows in the sequence table memory are in one-to-one correspondence with the groups in the data storage; and each row of the sequence table memory includes one for storing compression a storage unit of a ratio; a value stored in the storage unit represents an interval value of a data block address corresponding to each adjacent storage block in the corresponding group.
  • each row of the sequence table memory includes a pointer to a location of a group in which the data block adjacent to the data block in the corresponding group is located; and the corresponding position and the data step size in the group according to the current data. Directly determine the group in which the next data is located and the location in the group.
  • the pointer points to a location of a group of consecutive data blocks adjacent to the first data block in the corresponding group.
  • the pointer points to a location of a group of consecutive data blocks adjacent to a last data block in the corresponding group.
  • the data address and the data block address in the data block address storage unit are matched by the comparator, and the index is shifted by the shifter according to the value in the compressed ratio storage unit.
  • the data address can be converted to a cache address; the cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the intra-block offset in the data address;
  • the address can be used directly to address the data memory in the data cache.
  • the shifter performs corresponding shift on the block number in the cache address according to the value in the compression ratio storage unit,
  • the cache address is converted to a data address.
  • the system and method of the present invention can provide a basic solution for the data cache structure used by digital systems. Unlike the traditional data caching system which only populates after the cache is missing, the system and method of the present invention fills the data cache before the processor accesses a data, and can avoid or sufficiently hide the mandatory deletion. That is to say, the cache system of the present invention integrates a prefetch process.
  • the system and method of the present invention also divides the data store in the data cache into a group association portion and a group assignment portion.
  • each group in the group allocation part contains data blocks whose data addresses are adjacent or similar.
  • data corresponding to data access instructions adjacent to or close to the data address e.g., data access instructions in the loop code
  • the technical solution of the present invention converts the data address including the label, the index number and the intra-block offset into the group number, the group block number and the intra-block offset, while filling the data buffer with the data buffer.
  • the conversion of the address space enables the data caching system to directly address the new address addressing mode without having to perform tag matching, and can directly find the corresponding data from the data memory, especially when accessing data adjacent or close to the data address.
  • the buffer address and the data step size can be simply calculated to obtain the data address of the next data, without label matching and address conversion, which greatly reduces power consumption.
  • system and method of the present invention can read out the data from the data memory and send it to the processor core for use before the processor core is about to execute the data read instruction, so that the processor core needs to read.
  • the data When the data is fetched, it can be taken directly, masking the time of accessing the data memory.
  • FIG. 1 is an embodiment of a cache system according to the present invention
  • FIG. 2 is a schematic diagram of a track point format according to the present invention.
  • Figure 3A is another embodiment of the cache system of the present invention.
  • 3B is another schematic diagram of the track point format of the present invention.
  • 3C is another embodiment of the cache system of the present invention.
  • 4A is an embodiment of the improved group associative cache of the present invention.
  • 4B is another embodiment of the improved group associative cache of the present invention.
  • Figure 5 is an embodiment of a data buffer of the packet of the present invention.
  • Figure 6 is an embodiment of the data access engine of the present invention.
  • Figure 7A is an embodiment of the sequence table and data cache of the present invention.
  • Figure 7B is another embodiment of the sequence table and data cache of the present invention.
  • Figure 7C is another embodiment of the sequence table and data cache of the present invention.
  • 7D is an embodiment of a data storage manner in which the group boundaries are not aligned according to the present invention.
  • Figure 8A is an embodiment of the data access engine of the present invention.
  • Figure 8B is a schematic diagram of various address forms of the present invention.
  • Figure 8C is an embodiment of the sequence table operation of the present invention.
  • Figure 8D is an embodiment of the controller of the present invention.
  • Figure 6 shows a preferred embodiment of the invention.
  • FIG. 1 is an embodiment of a cache system according to the present invention.
  • the data cache system includes a processor 101, Active Table 109, Tag Memory 127, Scanner 111, Track Table 107, Tracker 119, Instruction Memory 103, and Data Memory 113 .
  • the various components listed herein are for ease of description and may include other components, and some components may be omitted.
  • the various components herein may be distributed across multiple systems, either physically or virtually, and may be hardware implemented (eg, integrated circuits), implemented in software, or implemented in a combination of hardware and software.
  • the processor may be a processing unit including an instruction cache and a data cache, capable of executing instructions, and processing the data, including but not limited to: a general processor (General Processor ), central processing unit (CPU), microcontroller (MCU), digital signal processor (DSP), image processor (GPU), system on chip (SOC), ASIC (ASIC) )Wait.
  • General Processor General Processor
  • CPU central processing unit
  • MCU microcontroller
  • DSP digital signal processor
  • GPU graphics processing
  • SOC system on chip
  • ASIC ASIC
  • the hierarchy of memory refers to the degree of proximity between the memory and the processor 101. The closer to the processor 101 The higher the level.
  • a high level of memory (such as instruction memory 103 and data memory 113) ) Usually faster than low-level memory but small in size.
  • 'Memory closest to the processor' refers to the memory that is closest to the processor, usually the fastest, in the storage hierarchy, such as the instruction memory 103 in this embodiment.
  • data memory 113 Furthermore, the memory of each level in the present invention has an inclusion relationship, that is, a memory having a lower level contains all the stored contents in a memory having a higher level.
  • a branch instruction refers to any suitable one that can cause the processor 101.
  • the form of the instruction that changes the execution flow eg, an instruction that is not executed in order.
  • the branch source refers to an instruction that performs branch operations (ie, branch instruction), the branch source address can be the instruction address of the branch instruction itself, the branch target refers to the target instruction that the branch instruction is caused by the branch instruction, and the branch target address can refer to The address that is transferred when the branch transfer of the branch instruction succeeds, that is, the instruction address of the branch target instruction;
  • the data read instruction refers to any appropriate processor that can cause the processor
  • An instruction form for reading data from a memory; the instruction format of the data read instruction generally includes a base address register number and an address offset; and data required for the data read instruction refers to the processor 101.
  • the current instruction may refer to an instruction that is currently being executed or fetched by the processor core; the current instruction block may refer to an instruction block containing an instruction currently being executed by the processor.
  • the term 'fill' refers to prefetching corresponding instructions or required data from an external memory in advance and storing it in an instruction cache or data buffer before the processor executes an instruction.
  • Track table 107 Contains a plurality of track points.
  • one track point is the track table 107
  • An entry in the file may contain information about at least one instruction, such as the type of the instruction.
  • the track point is a branch point, and the information may be a branch target address or the like.
  • the tracking address of the track point is the track table address of the track point itself, and the tracking address is composed of one row address and one column address.
  • the tracking address of the track point corresponds to the instruction address of the instruction represented by the track point, and for the branch point, the branch target instruction of each branch point containing the branch instruction represented by the branch point is in the track table.
  • the tracking address in 107, and the tracking address corresponds to the instruction address of the branch target instruction.
  • the instruction memory 103 may be stored by the processor 101 in addition to the memory.
  • the instruction type information corresponding to each instruction is stored, such as whether the instruction is information of the data read instruction; the instruction type information may further indicate which type of data read instruction the corresponding instruction is. This includes information on how to calculate the data address, such as the base address register number and the location information of the address offset in the instruction code.
  • BNX can be used to represent the row address in the branch point tracking address, ie BNX Corresponding to the location of the memory block where the instruction is located (the row number of the memory block), and the column address in the tracking address corresponds to the position (offset) of the branch instruction in its storage block.
  • each set of BNX and column address corresponds to the track table 107 A branch point in which the corresponding branch point can be found from the track table 107 based on a set of BNX and column addresses.
  • the branch point of the track table 107 also stores a branch target instruction of the branch instruction expressed in the form of a tracking address in the instruction memory. Location information in 103. Based on the tracking address, the position of the track point corresponding to the branch target command can be found in the track table 107. That is, for the track table 107 For the branch point, the track table address is the track address corresponding to its branch source address, and the track table content contains the track address corresponding to its branch target address.
  • the entries in the active table 109 are in one-to-one correspondence with the storage blocks in the instruction memory 103, that is, the track table can be
  • the lines in 107 correspond one-to-one.
  • Each entry in the active table 109 indicates where the instruction cache memory block corresponding to the active table row is stored in the instruction memory 103 and forms BNX. Correspondence with the instruction cache memory block.
  • Each entry in the active table 109 stores the block address of an instruction cache block.
  • Each memory block in data memory 113 is represented by a memory block number DBNX.
  • Tag memory 127 The entries in the table are in one-to-one correspondence with the storage blocks in the data memory 113, and each entry stores the data storage 113 The block address corresponding to the storage block is formed, and the correspondence relationship between the data block address and the data cache storage block number is formed. Thus, when using a data address in the tag memory 127 When matching is performed, the storage block number stored in the matching success item can be obtained, or the result that the matching is unsuccessful can be obtained.
  • the scanner 111 is sent from the external memory to the instruction memory 103.
  • the instruction is reviewed, and once an instruction is found to be a branch instruction, the branch target address of the branch instruction is calculated.
  • the branch target address For direct branch instructions, you can pass the block address of the instruction block where the instruction is located, the offset of the instruction in the instruction block, and the branch increment ( Branch Offset The three are added to get the branch target address.
  • the branch target address can be obtained by adding the corresponding base address register value and branch increment.
  • the instruction block address may be from the active list 109 Read in and sent directly to the adder in scanner 111. It is also possible to add a register for storing the current instruction block address in the scanner 111, so that the active table 109 It is not necessary to send the instruction block address in real time.
  • the scanner 111 When an instruction is found to be a data read instruction, the data address corresponding to the data read instruction can also be calculated. For example, the base address register value used for the data read instruction is added to the data address offset to obtain the data address.
  • data read instructions are divided into two categories: data read instructions for data address determination and data read instructions for data address uncertainties. For example, for a data read instruction that obtains a data address by summing the data read instruction itself with an address address and a data address offset (immediate number), whenever the calculated data address is correct, it can be classified as Data read command determined by the data address.
  • a data read instruction that obtains a data address by summing a base address register value and a data address offset (immediate number)
  • the base address register value has been updated when the data address is calculated
  • It can be classified as a data read instruction determined by the data address, otherwise it is classified as a data read instruction whose data address is undefined.
  • different data types can be given to the two data read instructions to be stored in the track table. 107 in the corresponding track point.
  • the branch target instruction address and the active table that can be calculated by the scanner 111 The storage block row address stored in the match. If the match is successful, indicating that the branch target instruction has been stored in the instruction memory 103, the active table 109 outputs the BNX to the track table 107. Fill in the corresponding entry of the branch instruction. If the match is unsuccessful, it indicates that the branch target instruction has not been stored in the instruction memory 103. At this time, the branch target instruction address is sent to the external memory, and at the active table 109.
  • the branch target instruction address is matched to the active table 109 to output a BNX
  • the position of the branch target instruction in its instruction block ie, the intra-block offset portion of the branch target instruction address
  • the tracking address is stored as a branch point content in a branch track point corresponding to the branch instruction.
  • the data read instruction can be found and the corresponding instruction type information is stored in the track table. Corresponding track point (ie, data point), and calculating the data address of the data read command and sending the data address to the external memory to obtain a data block including the corresponding data.
  • the tag memory 127 Allocating an available table entry, filling the data block into the corresponding storage block of the data storage 113, and outputting the DBNX and the offset address of the data in the data block (ie, DBNY The content as track points is stored in the data points.
  • the instruction block can be filled into the instruction memory 103.
  • a track corresponding to the entire instruction block is established.
  • the address that can directly address the data memory is called the cache address, ie the cache address (DBN) is DBNX and DBNY composition.
  • the read pointer 121 of the tracker 119 can be from the track table 107.
  • the track point corresponding to the current instruction in the beginning starts to move until it points to the first branch point.
  • the value of the read pointer 121 is the tracking address of the branch source instruction, which contains BNX. And the corresponding branch point column number.
  • the branch target instruction tracking address of the branch source instruction can be read from the track table 107.
  • the read pointer 121 of the tracker 119 is from the track table 107.
  • the track point corresponding to the currently executed instruction of the processor 101 starts to advance to the first branch point after the track point, and can track the address from the instruction memory 103 according to the target instruction. Find the target instruction in .
  • the read pointer 121 passes the data point, the buffer address stored therein is read out and sent to the data memory 113. The corresponding data is read and pushed to the processor core. 101. Thus, the data corresponding to all data read instructions between the current instruction and the first branch point thereafter is pushed sequentially to the processor core for reading.
  • FIG. 2 is a schematic diagram of a track point format according to the present invention.
  • the format contains the instruction type 151 BNX 153 and BNY 155 corresponding to the branch target instruction.
  • the format contains the instruction type 161, the corresponding data in the data memory 113 of DBNX 163 and DBNY 165.
  • the read pointer 121 of the tracker 119 is based on the track table 107.
  • the position of the branch point stored therein moves and points to the first branch point after the instruction being executed by the processor core 101, and reads the track point content from the branch point, that is, the position information of the branch target track point BNX and BNY. If the branch point corresponds to an indirect branch instruction, the corresponding branch target instruction block address needs to be read from the active table 109.
  • the processor core 101 outputs an instruction offset address (ie, an offset address portion in the instruction address) from the instruction memory 103 by the tracker 119 Read Pointer 121 Select the desired instruction from the pointed memory block.
  • an instruction offset address ie, an offset address portion in the instruction address
  • the processor core executes the branch instruction, if the branch transfer does not occur (TAKEN signal 123 is '0' '), continue to output a new instruction offset address, read and execute the next instruction after the branch instruction, while the tracker 119 reads the pointer 121 Continue moving and pointing to the next branch point and repeat the above. If a branch transfer occurs (TAKEN signal 123 is '1') and the branch instruction is a direct branch instruction, processor core 101 The branch target instruction that has been prepared can be directly executed.
  • the value of the tracker 119 read pointer 121 is updated to the BNX and BNY, i.e., the read pointer 121.
  • Point to the track point corresponding to the branch target instruction start moving from the track point and point to the first branch point, if the branch transfer occurs (TAKEN signal 123 is '1 '), and the branch instruction is an indirect branch instruction, the processor core 101 outputs the block address portion in the actual target instruction address and the previous slave active table 109 The instruction block address read in the match is matched. If the match is successful, the target instruction is correct for the processor core.
  • the actual target instruction address is sent to the external memory to acquire an instruction block containing the corresponding target instruction, and the target instruction is sent to the processor core 101 for execution.
  • the active table 109 Allocating an available entry, filling the instruction block into a corresponding storage block of the instruction memory 103, and outputting the offset address of the BNX and the target instruction in the instruction block (ie, BNY)
  • the content as the track point is stored in the branch point.
  • the value of the tracker 119 read pointer 121 is updated to the BNX and BNY, i.e., the read pointer 121.
  • Point to the track point corresponding to the branch target instruction start moving from the track point and point to the first branch point, and repeat the above operation.
  • the next instruction and the branch target instruction can be prepared for the processor core before the processor core executes the branch instruction. 101 is chosen to avoid performance loss due to cache misses.
  • the tracker 119 reads the pointer 121 past a data point, based on the DBN stored in the data point.
  • the corresponding data is read from the data memory 113.
  • the data read command is a data read command whose data address is undefined
  • the corresponding data block address needs to be read from the tag memory 127.
  • Processor core When the data read instruction is executed, if the data read command is a data read command determined by the data address, the processor core 101 can directly use the data. Otherwise, the processor core 101 The block address in the output actual data address is matched with the data block address previously read from the tag memory 127. If the match is successful, the data is correct for the processor core.
  • the pipeline in the processor core 101 is suspended, the actual data address is sent to the external memory to obtain a data block containing the corresponding data, and the data is sent to the processor core. After the recovery line. At the same time, an available entry is allocated in the tag memory 127, the data block is filled into the corresponding storage block of the data memory 113, and the DBNX is output. And an offset address (ie, DBNY) of the data in the data block is stored as the track point content in the data point.
  • DBNY offset address
  • the possible data corresponding to the instruction is ready. If the data is correct, the data memory 113 is completely avoided. Loss of performance due to missing, and can partially or completely mask the time required to read data memory 113. Even if the data is wrong, processor core 101 It is also possible to reacquire the correct data without increasing the waiting time.
  • FIG. 3A is another embodiment of the cache system according to the present invention.
  • This embodiment and Figure 1 The embodiment is similar in that a data address prediction module 301 is added and a step size bit is added to the data point format in the track table.
  • FIG. 3B is another schematic diagram of the track point format according to the present invention.
  • the format of the branch point still contains the instruction type 151 BNX 153 and BNY 155 corresponding to the branch target instruction.
  • the format of the data point includes the instruction type 161, the corresponding data in the data memory 113 DBNX 163, DBNY 165 and data step size 331 .
  • the data step size 331 Refers to the difference between the data address corresponding to the data read instruction corresponding to the two data operations before and after the data point, that is, the value obtained by subtracting the previous data address from the current data address. Based on the data step size, the possible value of the next data address can be guessed, that is, the current data address plus the data step size is used to obtain the possible value of the next data address.
  • the process of establishing a track and prefetching instructions and data is shown in FIG. 1
  • the track table in this embodiment is a compressed track table. Since only some of the instructions in an instruction block are branch instructions or data read instructions, it is possible to track table 107. Compress to reduce the track table 107 The need for storage space.
  • the compressed track table may have the same row as the original track table, but the number of columns is less than the original track table, and a mapping table stores the correspondence between the rows in the compressed track table and the rows in the original track table.
  • Each entry in the compressed track table is a branch point or a data point, and corresponds to the corresponding branch instruction and the data read instruction in the order of the instruction block.
  • the entries in the mapping table are in one-to-one correspondence with the branch points and the data points in the compressed track table, and store the offsets of the corresponding branch points and data points in the instruction block. In this way, after a block instruction or a data read instruction is converted into a column address by the intra-block offset in the instruction block in which it is located, the branch instruction in the compressed track table is found according to the column address. Find the corresponding branch point in the row pointed to by BNX, or find the BNX in the compressed track table that is read by the data according to the column address.
  • each entry in the compressed track table is a branch point or a data point. Therefore, when the tracker 119 Read pointer 121 When the branch transfer of the pointed branch point does not occur, the read pointer 121 is incremented 134 After increasing one, point to the next track point. If the track point is a branch point, the branch target instruction is read as described above and waits for the TAKEN sent by the processor core 101. Signal. If the track point is a data point, the corresponding data is read as described above and is ready for use by processor core 101. Specifically, the data can be stored into a first in first out buffer (FIFO) The processor core 101 is enabled to sequentially acquire data corresponding to each data read instruction in the correct order. Continue to move the read pointer afterwards 121 The above operation is repeated until it points to a branch point, and the branch target instruction is read as described above and waits for the TAKEN signal sent from the processor core 101.
  • FIFO first in first out buffer
  • the readout is performed.
  • DBNX is sent to tag memory 127 to read the corresponding block address.
  • the data block address and the DBNY read by the read pointer 121 constitute the data address when the data point was last executed, and are sent to the prediction module. 301 temporary storage.
  • the processor core 101 executes the data point, the current data address is sent to the prediction module 301 minus the last data address to obtain the data step size.
  • Prediction module 301 The data step is outputted back to the corresponding data point, and the data step is added to the current data address to obtain a predicted next data address.
  • the prediction module 301 sends the next data address to the tag memory.
  • 127 matches. If the match is successful, it means that the possible data when the data point is executed next time is already stored in the data memory 113, and the obtained DBNX and the offset address part in the next data address are matched (ie DBNY is stored back in the corresponding data point to complete the update of the data point. If the match is unsuccessful, it means that the possible data when the data point is executed next time is not yet stored in the data memory 113.
  • the next data address is sent to the external memory to obtain a data block including the corresponding data. At the same time, an available entry is allocated in the tag memory 127, and the data block is filled into the data memory 113.
  • the slave processor core 101 The third execution of the data read command begins, and the data may be ready. If the data is correct, the performance loss caused by the lack of data cache is completely avoided, and the time required to read the data cache can be partially or completely masked. Even if the data is wrong, the processor core 101 can also reacquire the correct data without increasing the waiting time.
  • the tracker 119 reads the pointer 121 while moving to the current processor core 101 During the first branch point after the instruction being executed, multiple data points may pass, and data is read in advance from the data memory 113 based on the DBN in these data points, so a FIFO is used.
  • the data corresponding to each data read instruction is temporarily stored in sequence for the processor core 101 to use sequentially, that is, the FIFO is used to store data to be used by the processor core 101.
  • a FIFO The DBN read from these data points is stored, and the corresponding data is read from the data memory 113 based only on the DBN that was read the earliest, and is acquired from the FIFO after the processor core 101 acquires the data.
  • the first DBN read at that time reads the corresponding data from the data memory 113 for use by the processor core 101, that is, the FIFO is used to store the processor core.
  • other operations of the cache system of the present invention are the same as those described in the previous embodiments, and details are not described herein again.
  • FIG. 3C is another embodiment of the cache system according to the present invention.
  • This embodiment and FIG. 3A The embodiment is similar in that the difference is that a sequence table 361 is added.
  • the entries of the sequence table 361 are in one-to-one correspondence with the entries of the tag memory 127, wherein the tag memory 127 is stored.
  • the position information PREV of the previous data block of the data block address in the entry and the position information NEXT of the next data block. For example, when the address memory is directed to the data memory 113 When two consecutive data blocks are filled in, the previous data block stores the DBNX of the next data block in the NEXT in the corresponding entry of the sequence table 361.
  • the block address of the next data block is the block address of the current data block plus N.
  • the block address of the previous data block is the block address of the current data block minus N. Since the next data address is equal to the sum of the current data address and the data step, the absolute value of the sum of the data step and the offset address in the current data address is divided by N. , you can get the number of data blocks between the next data address and the current data address.
  • it can be determined whether the next data address is a data block before the current data address or a data block after the current data address.
  • the next data address is in the same data block as the current data address, that is, the DBNX of the next data address is the same as the DBNX of the current data address.
  • the next data address is located in the data block before the current data address; when the sum of the data step size and the offset address in the current data address is greater than or equal to N
  • the next data address is located in a data block after the current data address.
  • the number of blocks between the next data address and the current data address is equal to the absolute value of the sum of the data step and the offset address in the current data address. N get the quotient.
  • the absolute value of the data step size is small, and the next data address tends to point to the previous (or next) data block of the current data address.
  • the sequence table corresponding to the current data address The 361 entry (that is, the sequence table 361 is stored in the PREV (or NEXT) in the table pointed to by the tracker 119 read pointer 121 read from the data point) DBNX is the DBNX corresponding to the next data address.
  • the DBNX storage back track table can be read directly from the sequence table 361 107 Medium, thereby avoiding the matching of the next data address in the tag memory 127.
  • an improved data cache structure can be used for better performance gains.
  • pairs are grouped together (way-set Associative) illustrates the improvement of the underlying cache.
  • Way-set For direct mapping caches, you can think of them as a way group for group associative caches ( Way-set ), implemented in the same way, will not be specified here.
  • Fully associative ) Cache the address between each memory block can be completely unconnected, so you can directly use Figure 3C
  • the sequence table in the embodiment constitutes a connection between the storage block and the storage block, so that the storage block position (ie, DBN) corresponding to the next data address can be directly found according to the current data address and the data step.
  • the data address is divided into three parts: label (TAG), index number (index And the offset within the block (offset ), and the index number of the storage block in each way group is continuous, that is, each index number exists in any one of the way groups and exists only once.
  • TAG label
  • index number index And the offset within the block
  • offset index number of the storage block in each way group is continuous, that is, each index number exists in any one of the way groups and exists only once.
  • the method of the present invention can be used to give the same tag to all memory blocks in each way group.
  • the index numbers of all the storage blocks in the path group are consecutive, the data blocks of consecutive addresses are stored.
  • the positional relationship between the storage blocks corresponding to the consecutive addresses is naturally formed, that is, within the range of one way group, the physical position (or index number) corresponding to the data blocks consecutive to the data address is also continuous, so that the prediction can be directly found.
  • the next data address DBNX corresponds to the next data address DBNX to reduce the number of matches in tag memory 127 or the latency of looking up the sequence table one by one.
  • the data addresses used are not contiguous, but appear as an arithmetic progression, so the data corresponding to many index numbers in each way group may be always Will not be accessed. Once the frequently accessed data is concentrated in several index numbers, it will be replaced due to insufficient path groups, which will reduce the performance of the cache system.
  • a compression ratio can be set for each road group, so that the index numbers in the road group are no longer incremented by one, but are incremented by a constant, so that the vast majority of the entire road group
  • the data is the data that will be accessed, and the utilization of the way group is improved as much as possible while still having data continuity.
  • each path group cached in this embodiment corresponds to a feature entry in which a compression ratio and a number of pointers are stored.
  • the value of the compression ratio is defined to be equal to the difference between the data block addresses corresponding to two consecutive memory blocks in the way group divided by the data block length.
  • the plurality of pointers point to the path group in which the last data blocks of the consecutive addresses of the first data block (ie, the data block with the smallest data address) are respectively located in the path group.
  • the compression ratio is ' 1 '.
  • the pointers all point to the way group itself, that is, the last several data blocks of the consecutive addresses of the first data block in the way group are in the local path group.
  • the DBNX corresponding to the data address It consists of the road group number and the storage block number in the road group. For example, if a road group contains 4 memory blocks, the road group number of the road group is '3', and the block numbers of the 4 memory blocks are '0' to '3 respectively. ', then their corresponding DBNX are '30' to '33'. As shown in the road group 401 in Fig.
  • all the storage blocks correspond to the label '2001', that is, 4
  • the data block addresses corresponding to the memory blocks are '20010', '20011', '20012', '20013 '.
  • the index number portion of each data address is equal to the intra-group block number value of the corresponding memory block in the way group.
  • the index number of the data block address '20010' is '0', and the corresponding internal number of the storage block is also '0'.
  • ';Data block address '20011 ' has an index number of ' 1 ', and the corresponding block number of the corresponding block is also ' 1 'and many more.
  • the storage block position corresponding to the current data address may be used (ie, DBNX).
  • DBNX the storage block position corresponding to the current data address
  • the data step is directly calculated to obtain the next data address corresponding to the memory block is the memory block or its next adjacent memory block.
  • the DBNX corresponding to the next data address is equal to the DBNX plus for this data address.
  • DBNX increments, which are the quotient of the data step size divided by the data block length.
  • the DBNX corresponding to this data address is '32' (the corresponding data block address is '20012) '), and the data step size is equal to the length of one data block, then the DBNX increment is equal to '1', and the DBNX of the next data address is equal to '32' plus '1', ie get '33 '(The corresponding data block address is '20013'), thus pointing to the correct memory block. Therefore, the DBNX corresponding to the next data address can be obtained without calculating the next data address and performing address matching. Value.
  • FIG. 4B is another embodiment of the improved group associative cache of the present invention. As shown in Figure 4B As shown in 403, all memory blocks correspond to the label '2001', but the corresponding data block addresses are '20010', '20012', '20014', '20016 '.
  • the index number portion of each data address is equal to the intra-group block number value of the corresponding memory block in the way group multiplied by the compression ratio.
  • the index number of the data block address '20010' is '0'. '
  • the block number of the corresponding memory block is '0'
  • the index number of the data block address '20012' is '2'
  • the block number of the corresponding memory block is '1' 'Wait, so that the index number is compressed in a compression ratio.
  • the DBNX increment is equal to the quotient of the data step divided by the data block length divided by the compression ratio. For example, suppose DBNX corresponds to this data address.
  • the DBNX increment is equal to ' 2 ' divided by ' 1 ' and divided by ' 1 '(ie equal to ' 1 ')
  • the DBNX of the next data address is equal to ' 31 ' plus ' 1 ', which gives ' 32 ' (the corresponding data block address is ' 20014 '), which points to the correct memory block and avoids the calculation and matching of data addresses.
  • the corresponding feature table item stores the compression ratio '2 In addition to the four pointers are stored.
  • Three of the pointers point to three data blocks adjacent to the address of the first data block (data block address '20010 ') in the way group 403 (ie, the data block address is ' The path group in which the 2000E ', '2000F ', and '20011' data blocks are located, the other pointer points to the next way group adjacent to the path group 403 address (the starting data block address is ' 20018 ').
  • the data step size is small, only the DBN corresponding to the current data address is needed.
  • the memory block corresponding to the next data address can be found in the current way group or the path group pointed by the pointer.
  • the data step size is an integer multiple of the data block length.
  • the data step size is not an integer multiple of the data block length, the extra part needs to be added to DBNY, the sum of the result becomes the new DBNY, and the carry part is added to DBNX.
  • the data step size is 3 data block lengths (that is, the DBNX increment is '3', and the next data address is '20015'.
  • the index number of the data address is first restored according to the compression ratio and the block number of the block of the storage block.
  • the block number of the block in the block is '1', multiplied by the compression ratio to get '2' '(ie the index number of the data block). Add this ' 2 ' to the DBNX increment ' 3 ' to get the next data address index number ' 5 '. After that, the next data address index number '5 'Compress by compression ratio, ie ' 5 ' divided by ' 2 ' to get the quotient ' 2 ', the remainder is ' 1 '.
  • the data corresponding to the next data address is located in the pointer corresponding to the remainder 417
  • the data in the pointed path group in which the quotient is the block number in the group, that is, the data corresponding to the next data address '20015' is in the storage block 421 in the group group 405 whose block number is '2'.
  • each road group can be configured as a plurality of groups, and each group can provide the same function as the road group, thereby It is convenient to increase the number of road groups and to store multiple sets of consecutive data blocks corresponding to different labels.
  • the data store in each way group can be divided into corresponding groups, each group corresponding to the same number of rows of consecutive index numbers, and corresponding to the same tag. That is, several data blocks corresponding to consecutive addresses of the same tag are stored in each group.
  • FIG. 5 is an embodiment of the data buffer of the packet according to the present invention.
  • the memory 501 It is divided into two groups, each of which contains a row of Content Addressable Memory (CAM), that is, a tag (such as Tag 503 and Tag 505).
  • CAM Content Addressable Memory
  • the data memory 511 It is also divided into two groups, each group containing four storage blocks, and the data block addresses in the four storage blocks are consecutive and correspond to the same label.
  • group 513 includes storage blocks 521, 523, 525 and 527, the data block addresses in the four storage blocks are consecutive, and both correspond to the label 503; the group 515 includes the storage blocks 531, 533, 535, and 537 The data block addresses in the four memory blocks are consecutive and correspond to the label 505.
  • each set of tags and corresponding sets of memory blocks also correspond to a register comparator and a decoder.
  • tag 505 corresponds to register comparator 519 and decoder 539 .
  • the register comparator includes a register and a comparator. The register stores the upper part of the index number in the start address of the data block in the set of stores.
  • the upper part of the index number in the data address passes through the bus 543 And sent to all registered comparators to compare with the stored index number upper part value, and according to the comparison result, only the comparison line of the content addressing storage line corresponding to the matching success item is charged, and the bus 541 is charged.
  • the sent tags match and the enable address is output to the decoder by the successfully addressed content addressed memory line.
  • the decoder is connected to the bus 545 under the control of the enable signal of the register comparator output.
  • the lower part of the index number in the upper data address is decoded, and an output is selected from the corresponding group of data blocks according to the decoding result.
  • each group can provide the equivalent of one road group.
  • consecutive index number high value values may be stored in two adjacent register comparators such that the index numbers corresponding to the two register comparators are also continuous.
  • the adjacent two groups are merged into one larger group to accommodate the data blocks of consecutive addresses.
  • each group can also be configured to be different in size to form a cache of the hybrid structure.
  • one road group in the cache may be configured into four groups, and the other road group may be configured into one group, and the two road groups constitute a cache portion of the continuous location storage; the other road groups are configured into a traditional form group.
  • the associative structure constitutes the cache portion of the random location storage.
  • the first road group contains a maximum of four consecutive data blocks
  • the second road group contains only one continuous data block.
  • the remaining path groups like the existing group associative cache, may each contain a maximum number of tags equal to the number of corresponding memory blocks (and the number of rows of the path group itself), and adjacent memory blocks may correspond to different tags.
  • data of consecutive data addresses (ie, the same tags) can be stored in the cache portion of the continuous location storage according to the characteristics of the program.
  • data with discontinuous data addresses it is stored in the cache portion of the random location storage.
  • the cache of the hybrid structure can be configured according to the characteristics of the program, which has the flexibility of data storage in the cache and the convenience of replacement, and can save a large number of label comparison operations when performing data access of consecutive addresses.
  • the current or upcoming data should belong to the cache portion of the continuous location storage, but the data block in which it is stored has been stored in the cache portion of the random location storage. .
  • the data block in which the data is located should be filled into the cache portion of the continuous location storage, and the corresponding storage block in the cache portion of the random location storage is invalidated.
  • the data to be accessed should belong to the cache portion of the random location store, but the block in which it resides is already stored in the cache portion of the contiguous location store. At this time, the data is stored in the cache portion stored in the continuous position without directly changing the position where the data is stored in the cache.
  • a data access engine is used to implement the following functions. That is, before the processor core calculates the data address, the data access engine populates the corresponding data into the data cache and prepares the data for use by the processor core.
  • the data reading is taken as an example for description, and the data storage may be implemented by a similar method, and the description will not be repeated here.
  • the data access engine is described in detail below through several specific examples. Please refer to Figure 6 It is an embodiment of the data access engine of the present invention. For ease of description, only some of the modules or components are shown in Figure 6.
  • data memory 113 and processor core 101 The same as described in the previous embodiment.
  • the data point format in track table 107 contains instruction types 621 , DBNX , DBNY 627 , and data step size 629 .
  • DBNX consists of a group number ( GN ) 623 and the block number 625 in the group
  • DBNY 627 is the intra-block offset (offset) in the data address.
  • Data Engine 601 contains sequence table 603, shifter 605, 607 and 609, adder 611, subtractor 613 and selectors 615, 616, 617.
  • the intra-group block number 625 in the data point contents read from the track table is sent to the shifter 605. Move left to the adder 611 according to the compression ratio. Since the shift of the block number 625 to the left by n bits is equivalent to the block number 625 multiplied by 2n in the group, the block number in the group after shifting by the shifter 605 625 is restored to the value of the index number in the corresponding data address.
  • DBNY 627 in the data point contents is sent directly to adder 611, with shifter 605
  • the output index number together constitutes an input to adder 611
  • the data step size 629 in the data point content is adder 611
  • the other input, the sum of the two is the index number and the intra-block offset in the next data address.
  • the intra-block offset is directly used as the DBNY corresponding to the next data address, and the index number is shifted by the shifter according to the compression ratio. 607 Right shifts to the block number corresponding to the next data address.
  • the number of bits shifted by the shifter 607 to the right is the same as the number of bits shifted to the left by the shifter 605, and the index number in the data address is shifted right by n
  • the bit is equivalent to the index number divided by 2n, so after the shifter 607 is shifted, the index number in the data address is again compressed into the corresponding intra-group block number and sent back to the track table storage, and the lowest n bits thereof Moved out to the right 631 is not part of the block number within the group.
  • the portion of the index number shifted out by the shifter 607 is sent to the selector 616 as a control signal, and the adder
  • the overflow signal (carry or borrow) of 611 is sent to selector 615 as a control signal.
  • Each input of the selector is derived from the group number pointed to by the group number 623 in the current data address in the sequence table 603 GN .
  • FIG. 7A is an embodiment of the sequence table and data cache of the present invention.
  • Number of rows and data memory of sequence table 603 The number of groups in 701 is the same, and the two correspond one-to-one.
  • the data memory 701 is divided into two road groups (i.e., road group 0 and road group 1), and each road group can be further divided into two groups. Therefore, data storage There are four groups in 701.
  • the group numbers are marked on the corresponding groups as shown in Figure 7A, that is, road group 0 contains group 00 and group 01, and road group 1 contains group 10 and group 11 . Further, for convenience of explanation, it is assumed that each group contains four memory blocks, each of which contains four data (or data words).
  • Each row contains a feature entry, a tag entry 715, and an index entry 717.
  • the feature table item further includes a compression ratio 703 and five pointers (ie, pointers 705, 707, 709, 711 and 713). The five pointers are shown in Figure 4B.
  • a pointer in an embodiment feature entry points to a group in which each data block adjacent to the first data block address in the group is located.
  • the index numbers of the data blocks in each group are not compressed, and therefore, except that one pointer points to the previous group of consecutive addresses of the group, and the other pointer points to the subsequent group of consecutive addresses of the group, The other three pointers point to this group.
  • the pointers 705, 707, and 709 point to the group itself (ie, group '00')
  • pointer 711 Pointing to the next group of consecutive addresses of the group (ie group '10')
  • pointer 711 points to the previous group of consecutive addresses of the group (ie group '11 ').
  • the pointers in the other lines are also as shown, wherein the pointer whose content is empty indicates that the group to which it is pointed is not displayed in the figure, or has not been determined, regardless of the case described in the embodiment.
  • the compression ratios of the four groups are '0. ', that is, the index number of the data address corresponds to the block number in the group, and each group corresponds to a complete label.
  • the pair of data addresses can be matched to find the group corresponding to the data address; and the two bits that are masked are the block numbers of the corresponding group in the group.
  • successive four data A, B, C, and D are sequentially accessed as shown in FIG. 7A.
  • data A and B are the last two data of the last memory block of group '11'
  • data C and D are group '00
  • the first two data of the first memory block, that is, the difference between the data addresses of the four data is the data step '1'.
  • the data point contents are from the data memory 701 according to the track table 619.
  • the DBNX, DBNY and data step sizes in the data point are read out, and the value of DBNX is '1111' (ie 4th in group '11' Storage block), where the group number is '11', the block number in the group is '11'; the value of DBNY is '10' (ie the third data in the storage block); the value of the data step is '1' '(ie the next access data B is the last data of data A).
  • the intra-group block number ('11') in the DBNX is sent to the shifter 605.
  • the group number in DBNX is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the fourth line in the sequence table 603).
  • the compression ratio (' 0 ') is sent to the shifters 605 and 607 As the number of shift bits (ie no shift).
  • the output '11' of shifter 605 is combined with DBNY (' 10 ') to form '1110' and data step '1' to get '1111 ', where the block number '11' in the group is still '11' after the output of the shifter 607, that is, the block number ('11') and DBNY ('11) corresponding to the next data address are obtained. ').
  • the respective pointer values of the fourth row in the sequence table 603 correspond to the ports '1', '2', '3', '4, respectively.
  • the 'and' -1 ' outputs are sent to selectors 616 and 615 respectively, and the port ' 0 ' outputs the corresponding group number of the line ' 11 '(This group number corresponds to the line number, so it is not necessary to occupy the writable memory in the line, but to hard-code the read-only way to save storage space) to the selector 615.
  • the group number '11' outputted by port '0' is selected as the group number corresponding to the next data address.
  • the next data address corresponds to DBNX (ie group number '11 'and the block number '11') and DBNY ('11') are generated and point to data B in data memory 701.
  • the DBN is written back to the track table via bus 649 619 Within this data point, it is used for the next read of data B.
  • the group number '11' in the data point and the block number in the group are 11 ', DBNY '11' and data step '1' are read again.
  • the block number '11' in the group is output by the shifter 605 and forms '1111' and data step size together with DBNY. 1 'Additions get '0000' (ie the block number '00' and DBNY '00' in the group corresponding to the next data address) and overflow to get the carry '1'.
  • the sequence table 603 The respective pointer values of the fourth row and the corresponding group number of the row itself are sent to selectors 616 and 615, respectively.
  • the port '4 is selected.
  • 'Output group number ' 00 ' is the group number corresponding to the next data address.
  • the next data address corresponds to DBNX (ie group number '00' and group block number '00'), DBNY (' 00 ') is generated and points to data C in data memory 701.
  • the DBN is written back to the data point in track table 619 via bus 649 for reading data C next time.
  • the DBN corresponding to the next data address can be calculated according to the data step size when the compression ratio is '0'.
  • the index number in the data address can be compressed.
  • Table 1 Some commonly used compression ratios and corresponding shift bits, masks (or masks), etc. are shown.
  • the first column shows the range of the data step size; the second column shows the case where the label and the index number stored in the sequence table match the mask bit, where T is the label, I Indicates the block number within the group, the underlined part indicates the masked bit; the third column shows the corresponding shift bit number; the fourth column shows the corresponding compression ratio.
  • the data step size is less than twice the data block length, so it is not compressed. At this time, only the index number is masked, and the shift bit number is '0. ', compression rate ' 1 '(ie not compressed).
  • the data step size is greater than or equal to twice the length of the data block and less than four times the length of the data block, so it can be compressed. At this time, the lowest bit of the mask tag and the high bit of the mask index number are masked, and the number of shift bits is '1. ', the compression ratio is '2'.
  • the data step size is greater than or equal to four times the length of the data block and less than eight times the length of the data block, so it can be compressed.
  • the lowest two bits of the label are masked, and the number of shift bits is '2.
  • the compression ratio is '4'.
  • the data step size is greater than or equal to eight times the length of the data block and less than sixteen times the length of the data block, so it can be compressed.
  • the lowest second and third two bits of the label are masked, and the number of shift bits is ' 3 ', the compression ratio is '8'. For other situations, this can also be the case.
  • FIG. 7B is another embodiment of the sequence table and data cache of the present invention.
  • Figure 7B The structure of each group and sequence table in the cache is the same as in Figure 7A.
  • the compression ratio is '01'
  • the data step size is an integer multiple of the data block length (the data step is in the form of two's complement) 11000 ', which is the decimal '-8').
  • the lowest bit of the data address index number corresponding to each memory block of group '00' and group '01' is '0'
  • group '10' and group ' The lowest bit of the data address index number corresponding to each memory block of 11 ' is '1'.
  • the mask bit is based on the compression ratio (' 1 ') is shifted one bit to the left, masking the upper digit of the index number in the data address and the lowest digit of the label (as in Figure 7B, label 715 and index number 717)
  • the underline is shown in the middle). That is, matching the portion of the data address other than the lowest bit and the lowest bit of the index number to find the group corresponding to the data address; and the two bits that are masked are the corresponding group of the data address in the group. Inner block number.
  • the sequence table In 603 the label value in the row corresponding to the group '00' is '1000', and the label value in the row corresponding to the group '01' is '1010', and the one that is masked is '0'. ', indicating that the group boundaries of the data blocks stored in the two groups are aligned.
  • the labels of the group '00' and the group '01' are consecutive except for the lowest one and the lowest digit of the index number is the same, that is, the group '00
  • the label and index number of the data block stored in ' are '100000', '100010', '100100' and '100110'; group '01
  • the labels and index numbers of the data blocks stored in ' are '101000', '101010', '101100' and '101110', respectively.
  • the intra-group block number ('01') in the DBNX is sent to the shifter 605.
  • the DBNX The group number '01' in the middle is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the second line in the sequence table 603).
  • the one of the index numbers 717 that is not masked is used for the shifter 605
  • the rightmost complement is shifted to the left; the compression ratio (' 01 ') is sent to shifters 605 and 607 as shift bits (ie, shifted by one).
  • shifter 605 pairs the input '01 'Shift one bit to the left and fill the complement ' 0 ', get ' 010 ' with DBNY (' 01 ') together with ' 01001 ' and data step ' 11000 ' to get ' 00001 ', where the block number '000' in the group is shifted to the right by shifter 607 and output '00', which is the block number ('00') and DBNY (' in the group corresponding to the next data address. 01 ').
  • the group number of the port '0' output is selected.
  • 'A the group number corresponding to the next data address.
  • DBNX corresponding to the next data address (ie group number '01 ' and group block number '00')
  • DBNY ' 01 '
  • the DBN is written back to the data point in track table 619 via bus 649 for use in the next read data F.
  • the group number '01' in the data point and the block number '00 in the group. ', DBNY '01 ' and data step '11100' are read again.
  • the block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 0 ' and then with DBNY Together, '00001' and data step '11000' are added to get '11001' (that is, the block number corresponding to the next data address is '110' and shifted to the right by '11' and DBNY ' 00 '), and a borrow overflow occurred.
  • the group number '00' output by the port '-1' is selected as the group number corresponding to the next data address.
  • the DBNX corresponding to the next data address ie group number '00' and group block number '11'
  • DBNY '01'
  • the DBN via bus 649 It is written back to the data point in the track table 619 for the next read data G.
  • the compression ratio is not '0. ', but when the data step size is an integer multiple of the data block length, the DBN corresponding to the next data address is calculated according to the data step size.
  • FIG. 7C is another embodiment of the sequence table and data cache of the present invention.
  • Figure 7C The structure of each group and sequence table in the cache is the same as in Figure 7B.
  • the data step size is not an integer multiple of the data block length (the data step is '1001', that is, the decimal '9' ').
  • the labels of the group '00' and the group '01' are consecutive except for the lowest one and the lowest digits of the index number are the same, the group '01' and the group '11
  • the 'label' is identical except for the lowest one and the index number is the lowest.
  • the label and index number of the data block stored in the group '11' are '101001', '101011', '101101' and '101111 respectively '.
  • DBNX is '0010', where the group number is '00', the block number in the group is '10'; the value of DBNY is '01'; the value of the data step is ' 1001 '.
  • the intra-group block number ('10') in the DBNX is sent to the shifter 605.
  • the DBNX The group number '00' in the sequence is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the first line in the sequence table 603).
  • the one of the index numbers 717 that is not masked is used for the shifter 605
  • the rightmost complement is shifted to the left; the compression ratio (' 01 ') is sent to shifters 605 and 607 as shift bits (ie, shifted by one).
  • shifter 605 pairs the input '10 'Shift one bit to the left and fill the complement ' 0 ', get ' 100 ' and DBNY (' 01 ') together with '10001 ' and data step '1001 ' to get '11010 ', where the block number '110' in the group is shifted by one bit by the shifter 607 and then output '11', that is, the block number ('11') and DBNY ('10) corresponding to the next data address are obtained. ').
  • the group number '00 of the port '0' output is selected.
  • 'A the group number corresponding to the next data address.
  • the next data address corresponds to DBNX (ie group number '00' and group block number '11')
  • DBNY ' 10 '
  • the DBN is written back to the data point in track table 619 via bus 649 for use in reading data K next time.
  • the group number '00' in the data point, the block number in the group '11 ', DBNY '10' and data step '1001' are read again.
  • the block number '11' in the group is shifted to the left by one shifter 605 and complemented by '0' and then with DBNY.
  • the combination of '11010' and the data step '1001' yields '00011' (ie, the block number in the group corresponding to the next data address is '000' and is shifted to the right by '00' and DBNY ' 11 '), and a carry overflow occurs.
  • the group number '01' in the data point and the block number '00 in the group. ', DBNY '11' and data step '1001' are read again.
  • the block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 0 ' and then with DBNY Together, '00011' and the data step '1001' are added to get '01100' (that is, the block number corresponding to the next data address is '011' and the right one is shifted to '01' and DBNY ' 00 ').
  • the selector 616 is selected based on the removed portion 631. That is, the group number '11' output by the port '1' is selected and passed through the selector 615. After selection, it is the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '11' and group block number '01'), DBNY (' 00 ') is generated and points to data M in data memory 701.
  • the DBN is written back to the data point in track table 619 via bus 649 for reading data M next time. Use. According to the above method, if the compression ratio is not '0' and the data step is not an integer multiple of the data block length, the DBN corresponding to the next data address is calculated according to the data step.
  • the index number corresponding to the first data block of each group may not be '0. ', in order to achieve a data storage method in which the group boundaries are not aligned, the data is flexibly stored, thereby saving storage space.
  • FIG. 7D is an embodiment of a data storage manner in which the group boundaries are not aligned according to the present invention.
  • Figure The structure of each group and sequence table in the 7D cache is the same as in Figure 7A.
  • the compression ratio is '10'
  • the data step size is not an integer multiple of the data block length (the data step size is '10001') ', which is the decimal '17').
  • the lowest bit of the data address index number corresponding to each memory block of group '00' and group '01' is '00'
  • group '10' and group '11 The lowest bit of the data address index number corresponding to each memory block is '01'.
  • the mask bit is based on the compression ratio (' 10 ') is shifted to the left by two, masking the lowest two digits of the label in the data address, and does not mask the index number in the data address (as in Figure 7D, label 715)
  • the underline is shown in the middle).
  • the part of the data address except the lowest two bits and the index number are matched to find the group corresponding to the data address; and the two bits that are masked are the block numbers of the corresponding group in the group.
  • the sequence table In 603 the label value in the row corresponding to group '00' is '1000', and the label value in the row corresponding to group '01' is '1100', and the two blocks that are masked are '00'.
  • group '00' and group '01 The label of 'the last two digits is consecutive and the index number is the same, and the labels of the group '01' and the group '11' are identical except for the lowest two digits and the index numbers are consecutive.
  • the label and index number of the data block stored in ' are '0000000', '0100100', '0101000' and '0101100'; group '01 The label and index number of the data block stored in ' are '0110000', '0110100', '0111000' and '0111100'; for group '11 ', because the group boundaries are not aligned, and the offset is '01', the labels and index numbers of the data blocks stored therein are '0110101', '0111001', '0111101 'and ' 1110001 '.
  • the group boundaries of the group '11' are not aligned, they are based on the bus 641.
  • the lowest two digits of the label in the data address also need to be stored in the sequence table 603 by the bus 643.
  • the lowest two bits of the tag stored in the corresponding row are subtracted by the subtractor 613 to determine the block number within the group corresponding to the data address.
  • the data address is '011011011' (ie the label is '01110' ', the index number is '01', the offset within the block is '11'), according to the label except the lowest two digits (' 011 ') and the index number ' 01 ' can be matched to the group ' 11 '.
  • the lowest two digits of the label (' 10 ') are subtracted from the lowest two digits (' 01 ') of the label stored in the row corresponding to the group '11' in the sequence table 603 by the subtractor 613, resulting in '01 '(Second data block), ie the data address corresponds to the last data of the second data block of group '11'.
  • DBNX is '0010', where the group number is '00', the block number in the group is '10'; the value of DBNY is '01'; the value of the data step is ' 10001 '.
  • the intra-group block number ('10') in the DBNX is sent to the shifter 605.
  • the DBNX The group number '00' in the sequence is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the first line in the sequence table 603).
  • the two bits that are not masked in index number 717 are used for the shifter 605.
  • the rightmost complement is shifted to the left; the compression ratio ('10') is sent to shifters 605 and 607 as shift bits (ie, shifted by two).
  • shifter 605 pairs the input '10 'Shift left by two and fill the complement '00', get '1000' and DBNY ('01 ') together with '100001' and data step '10001' to get ' 110010 ', where the block number '1100' in the group is shifted by two bits and then output '11' by the shifter 607, that is, the block number ('11') corresponding to the next data address and the DBNY are obtained. (' 10 ').
  • the group number '00 of the port '0' output is selected.
  • 'A the group number corresponding to the next data address.
  • the next data address corresponds to DBNX (ie group number '00' and group block number '11')
  • DBNY ' 10 '
  • the DBN is written back to the data point in track table 619 via bus 649 for use in reading data Q next time.
  • the group number '00' in the data point, the block number in the group '11 ', DBNY '10' and data step '10001' are read again.
  • the block number '11' in the group is shifted to the left by the shifter 605 and the bit is shifted to '00' and then DBNY
  • '110010' and data step '10001' are added to get '000011' (that is, the block number corresponding to the next data address is '0000' and is shifted to the right by '00. 'and DBNY ' 11 ')
  • a carry overflow occurs.
  • the group number '01' in the data point and the block number '00 in the group. ', DBNY '11' and data step '10001' are read again.
  • the block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 00 ' and DBNY Together with the '000011' and the data step '10001' are added to get '010100', the shifter 607 outputs the index number '0101 to the adder 611. 'Shift right to get '01'.
  • the shifter 607 shifts the index number '0101' output from the adder 611 to the right, the right side shifts out the portion 631 to ' 01 ', does not coincide with the complement '00' in the index number, so the removal portion 631 selects the selector 616. That is, select the port number '11' output by port '1' and pass the selector 615 is selected as the group number corresponding to the next data address. Then read the group boundary offset '01' from the row corresponding to the group '11' in the sequence table 603, and shift the resulting '01 with the shifter 607. 'Subtract the set of boundary offset ' 01 ' to get the true intra-group block number ' 00 '.
  • next data address corresponds to DBNX (ie group number '11' and group block number '00'), DBNY (' 00 ') is generated and points to data S in data memory 701.
  • the DBN is written back to the data point in track table 619 via bus 649 for reading data next time S Use.
  • the compression ratio is not '0'
  • the data step is not an integer multiple of the data block length, and the group boundary is not aligned
  • the DBN corresponding to the next data address is calculated according to the data step size. .
  • data that the processor core may load is pre-filled into the cache in advance in the following manner, and the supply processor core is taken out in advance.
  • the present embodiment advances the abstraction of the instruction or instruction that is being executed or is to be executed by the processor core in advance to process the data read or data store instruction (hereinafter, the data read instruction is taken as an example).
  • the data read instruction is taken as an example.
  • the start data address of the instruction is determined and recorded according to the data address generated by the processor core.
  • the second data address generated by the processor core is subtracted from the start data address of the same data read instruction in the record to obtain the data read.
  • the difference between the data addresses of the two adjacent executions of the instruction is recorded as the data step size.
  • the data step is added to the second data address to obtain the next data address and recorded.
  • the next data address queries the high-level memory for the data. If the data is not in the high-level memory, the next data address is retrieved from the low-level memory and filled into the high-level memory.
  • next data address corresponding to the instruction is extracted from the record and provided to the processor core for use.
  • the next data address is compared to the exact data address provided by the processor core as needed. If there is no error, and the next data address is added to the data step, the new next data address is obtained and recorded. And the new next data address is used to query whether there is any data in the high-level memory. If the data is not in the high-level memory, the corresponding data is obtained from the low-level memory with the new next data address and filled into the high-level memory. If an error is found in the comparison, the correct address at the time of the error is used as the starting data address and is re-executed as described above.
  • FIG. 8A is an embodiment of the data access engine of the present invention.
  • Figure 8 shows Figure 6 A more complete embodiment based on the embodiments.
  • the processor core 101 and the data memory (or primary data memory) 113 are the same as those described in the previous embodiment, and the data memory 113
  • the data in is a subset of the data in lower level memory 115.
  • a first in first out buffer (FIFO) 849 is used as a data buffer between the data memory 113 and the processor core 101.
  • Label 841 Together with data store 113, it forms a traditional way group cache.
  • Sequence table 603 in data access engine 801, shifters 605, 607, and 609, adder 611, subtractor 613 and selector 617 are the same as the corresponding function blocks in data access engine 601 in FIG.
  • the selector 618 in this embodiment includes the selector in the embodiment of FIG. 615 and 616.
  • the sequence table 603 and the selector 618 increase the storage and selection of the group numbers corresponding to more adjacent groups, and the group valid bits and the index bit valid bits are also added to the sequence table 603.
  • Controller 803 Controls the operation of the data access engine.
  • the selectors 811, 813, and 815 are selected from the track table 619 or the subtractors 613, 805 under the control of the controller 803.
  • the intra-group index number, the intra-block offset, and the step size are used by the adder 611, and the shifters 605, 607 calculate the next DBN.
  • Subtractor 613 according to data address 641 in the sequence table 603 Match the resulting 643 to calculate the index number and the offset within the block.
  • the subtractor 805 finds the difference between the memory addresses of the two accesses of the same memory access instruction, that is, the data step (stride).
  • Converter The converter 807 converts the step size into a compression shift signal stored in the sequence table 603, which is used as a shift bit number for controlling the shifter.
  • Current cache address bus 821 The contents of the entry from the track table 619 are sent to each function block.
  • the intermediate result bus 823 sends the block number and the intra-block offset from the subtractor 613 group to the adder 611, and the shifter 605, 607 Calculate the next DBN.
  • Bus 825 sends the data step calculated by subtractor 805 to selector 815.
  • Control signal generated by controller 803 827 control selector 811, 813, 815, 817, 617, and 819.
  • the shift signal 829 output from the sequence table 603 controls the shifters 605, 607 and 609.
  • the next data address bus 881 sends the next data address to the sequence table 603.
  • the corresponding data address is generated from the lower memory 115 to the data memory 113, and the data is pre-populated.
  • the DBN is sent to the track table 619 storage.
  • the traditional cache is a kind of indirect addressing based on matching.
  • the index bits in the middle of the data address are read from the cached tags and then matched with the high bits in the data address. If the labels of a road group match, it is called a hit.
  • the content of this way group is the data address pointed to.
  • Primary data storage 113 It is composed of a plurality of identical memories, each of which constitutes a road group, and each of the road groups has the same number of rows, that is, a multi-path group. Each storage line of each memory is called a primary data block, and each primary data block has an index number ( INDEX) 802, which is determined by the line number in the primary data memory 113 where the primary data block is located.
  • Intra-block offset 627 points to a data item within the block.
  • the data address 804 can be divided into high-order tags according to the number of primary data blocks per block group in the primary data memory 113 and the number of data in the block 801 , the middle index bit 802 and the lower block offset 627 ,
  • the cache start is also matched with indirect addressing, and after the relationship between the data address and the cache address is established, the cache address is directly addressed.
  • Direct addressing with cache addresses saves tag matching operations, saves power, and increases memory access speed.
  • the group storage address 808 is divided into the upper memory group address (GN). 623, the middle block number (index) 625 and the lower block intra-block offset (offset) 627.
  • the associated cache, the cache address 806 is divided into the high road group number 814, the index number in the middle of the road group 802 and the low intra-block offset 627 .
  • Selector 843 selects the group associative cache address generated by the tag 841 or the group storage address generated by the data access engine for the track table 619 to store.
  • the two addresses are of the same form, and the essence is the data memory 113. the address of.
  • the content can convert the data address and the cache address to each other.
  • the data address is sent from the bus 641 to the sequence table 603.
  • Matching the tag and the index number, the group number corresponding to the matching entry can be read from the bus 835, the tag and the index number are also read from the bus 643, and the data address on the bus 641 is 643.
  • the label and index number on the label are subtracted by the subtractor 613, which results in the label low bit and the index number and the intra-block offset.
  • the label low and index number are shifted by the shifter 609 After shifting, the intra-group block number of the corresponding cache address is obtained, and the above-mentioned group number, intra-group block number and intra-block offset are combined on the bus 837 to obtain a cache address corresponding to the data address.
  • the sequence number table 623 is addressed in the cache address 603.
  • the tag and index number read therefrom are sent via bus 643.
  • the tag and index number are added to the intra-group block number 625, the intra-block offset, 627 in the cache address, and the sum is the data address.
  • the data access engine can provide a data address to access the lower layer memory.
  • a cache address can also be provided to access the data store 113.
  • the data access engine also stores the correspondence between the data address and the cache address, and can be converted from one address to another.
  • the tracker 845 determines the next track table read address based on the content of the track table 619 data point output 851 .
  • the track table read address 851 is delayed by the delay 847 and is written as the track table address 853.
  • FIG. 8C is an embodiment of the sequence table operation of the present invention.
  • the sequence table 603 It consists of a register, a comparator, and a mask register.
  • Each register can also be implemented by a memory.
  • the tag and index number and the comparator can be addressed by the content (MEM) ) Implementation. Labels, index numbers, comparators, and masks can be tri-stated CAM (tri-state CAM) )to realise.
  • the mask acts on the lower bits of the tag and the index number, and can selectively make the lower bits of the tag or some bits in the index number not participate in the comparison (ie, the bits of the tags or index numbers do not affect the comparison result), and the data compression storage is implemented.
  • the mask is controlled by the shift region when the shift is ' When 0 ', the mask masks the lowest bit (ie index number); when the shift is '1', the mask moves one bit to the left, masking the lowest bit of the tag and the high bit of the index number, leaving the lowest index number One participated in the comparison.
  • Comparator 897 The data address on the bus 641 is compared with the label, the index number 895 is masked by the mask 896, and the comparison result is sent via the bus 888 for the controller 803 to make a decision basis.
  • Adjacent group area 892 Store the group number and valid digit adjacent to the group for tracking when the data step is stepped over the boundary of the group.
  • Group valid signal 893 When the data is written for the first time in the group, it indicates that at least one data block in the group is valid, and also indicates that the group corresponds to the data pointed to by the address included in the label and the index number segment.
  • Block valid signal 894 Each bit represents the validity of a data block in the group, and the lower bit and the index number in the tag input on the bus 641 can be decoded by the shifted result controlled by the shift region 891 (for example, the 2-bit binary address is decoded as 4 bits, of which only one is valid (one-hot), each representing a data block) to select the block valid signal 894
  • the data block valid signal in the data block if the data block valid signal is valid, the corresponding data is already in the corresponding data block of the group in the data memory 113. If it is invalid, the corresponding data needs to be filled into the data block.
  • the sequence table 603 can be accessed in two ways. One way is via bus 641 in Figure 8A.
  • the data address entered matches the tag in the sequence table 603, and the other mode is directly addressed via the group number 831 or group number 833 in Figure 8A.
  • Sequence table that is matched by the data address or addressed by the group number Each data area in the table entry in 603 can be read or written.
  • the corresponding group number 835 can be read via data address matching, or the corresponding label 643 can be read out via group number 829 addressing.
  • Sequence table Other areas of the 603 that are matched or addressed to the selected entry, such as the adjacent group number, the block valid signal and the group valid signal can be read or written. All regions in the entry are reset to all '0' before being written.
  • the first stage is to process a data read instruction for the first time. This stage determines whether the data read instruction is in a loop. If it is not in the loop, the data read command is executed in each group of the group associative buffer area. The index number in the data address is assigned to the primary data storage.
  • the instruction block in a path group in 113 writes the data, and the label corresponding to the index number in the path group 841 Write the label portion of the data address.
  • a group is allocated in the buffer area allocated by the group for the data that the data read instruction may read. In both cases, the data address is mapped to the cache address and stored in the relevant information area of the data read instruction, and the memory is accessed by the data address to provide corresponding data to the processor core. 101.
  • whether the data read instruction is located in the loop may be determined according to whether the data read instruction is located between a reverse branch instruction and its branch target instruction.
  • the tracker can provide a pointer to the first inverted branch instruction following the current instruction being executed by the processor core, ie, the branch target address of the branch instruction is less than the address of the branch instruction itself.
  • the tracker pointer can also point to more reverse branch instructions after the current instruction, and determine which data is included in each loop according to the branch target address of each inverted branch instruction. Take instructions.
  • the second stage is the second processing of the same data read instruction.
  • the corresponding data is provided to the processor core.
  • the data step size is calculated according to the difference between the second data address and the first data address (stored when the data read instruction is processed for the first time), and the second data address is added to the data step size.
  • Finding a possible data address of the memory access when processing the data read instruction for the third time, and using the possible data address from the lower layer memory 115 Read the data. Further, a buffer address corresponding to the possible data address is converted, and data from the lower layer memory 115 is filled in the data memory 113 accordingly. .
  • the cache address is stored in the relevant information area of the data read command.
  • the third stage is to process the same data read instruction after the third or third time.
  • data is directly supplied from the data memory 113 to the processor core directly from the cache address stored at the previous time.
  • the data access engine also has a mechanism to compare the data address generated by the processor core 101 with the last stored cache address, and if not, press the processor core. The generated data address retrieves the data and corrects the cache address.
  • the cache address is added to the data step to obtain the possible cache address for the next load, and the memory is filled by this address. .
  • the new cache address is then placed in the relevant information area of the data read command for future use. After that, the data read command is processed in the same manner as the third time.
  • the different stages of processing the data read instructions are controlled by controller 803.
  • Track table 619 The buffer address of the data read instruction and the initial value of the data step are both 0 when the track is established.
  • the controller 803 reads the buffer address and data step size of the data read command and the track table address of the data read command. Please refer to the picture 8D, which is an embodiment of the controller of the present invention.
  • the controller 803 has a complex array matching counter whose unit structure is a memory 861, a comparator 862 and a counter 863. As a group, the bit width of the memory 861 and the comparator 862 is equal to the track table address, and the bit width of the counter 863 is two bits.
  • the initial value detector 865 is used to detect the instruction form, and the buffer address and data step size of all '0'. Another bus 821 The instruction type, cache address and data step are imported into an initial value detector 865 which connects the track table address to the input of the memory of each matching counter group and one port of the comparator. Bus 866 The count value of the counter in the group matching the value stored in the memory (such as 861) with the track table address (address of the current data read command) on the bus 851 is transferred to the control logic 867 Controls the operation of the data access engine at the stage in which the instruction is placed.
  • the controller 803 When the initial value detector 865 detects a non-data read command, the controller 803 operates in mode 0. In the state. No response to this directive. When a data read instruction with a buffer address and a data step size of '0' is detected, it is judged that the instruction has not been processed, and the first stage mode operation is entered. First, the initial value detector 865 generates a write enable signal 868 that stores the data read instruction track table address on bus 851 into memory 861 in the match count unit pointed to by the pointer of splitter 864. At this point, the memory The value of 861 is the same as the value on bus 851, and the output of comparator 862 is '1', ie the group is the current instruction set.
  • the count of the corresponding counter 862 of the current instruction group is increased by '1' to get '1' ', and the count value is placed on bus 866 and transferred to control logic 867 to set each selector and function block in the data access engine in a first phase mode.
  • the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851.
  • the value of register 861 matches it, and comparator 862 controls counter 863 to increment '1' ', make its count value '2'.
  • the group on the match is called the current instruction group, and the value of the counter in the current instruction group is placed on the bus 866 and transferred to the control logic. It sets each selector and function block in the data access engine in the second stage mode.
  • the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851.
  • the value of register 861 matches it, the group is the current instruction group, and comparator 862 controls the counter 863.
  • This value is placed on bus 866 and transferred to control logic 867 to set each selector and function block in the data access engine in a third stage mode.
  • the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851.
  • the value of register 861 matches it, the group is the current instruction group, and comparator 862 controls the counter 863.
  • This value is placed on bus 866 and transferred to control logic 867 to set each selector in the data engine in a third stage mode.
  • Control logic 867 The count value '0' or '3' is operated in the default state of the third stage.
  • the counter counts to '0', its count value no longer increases, causing comparator 862 to no longer participate in the comparison.
  • Count value is '0 'Also allows the unit to be selected by the allocator 864 for use by other data read instructions.
  • the initial value detector 865 detects the slave bus 821 at the same time.
  • the corresponding buffer address and data step size sent from are not '0', and the comparator and bus in each group are 851.
  • the track table addresses are compared and the results are the same. Based on this, it is judged that this is a data read instruction that has entered the third stage, and the control logic 867 controls the operation of the data engine in the default mode, that is, the third stage mode.
  • the difference between the feedback signals 888, 889 and subtractor 805 returned from tag 841 and sequence table 603 The 825 is sent back to the control logic 867 in the controller 803, and the control logic 867 is based on these feedback signals with the slave bus 866.
  • the stage information of the current instruction is controlled to control the operation of the data access engine.
  • control logic 867 Information can also be fed back to the matching counter group to change the stage of the current instruction to handle anomalies. For example, a data read instruction has entered the third stage, such as the predicted data address and the slave processor core via the bus 641. If the data address sent does not match, the control logic 867 will send a feedback signal to the matching count group corresponding to the current instruction, so that the count value is '1. '. Thereafter, the instruction begins execution in the first phase state, undergoes the second, third phase re-establishment step size and the next cache address is stored in the track table 619.
  • Tracker 845 Control track table read address 851 Move to the next data read instruction, the data read instruction corresponds to the type 621, DBN (623, 625, 627), data step size 629 is placed on the current data address bus 821.
  • Controller 803 reads type 621 recognized as data type, DBN and data step size are all '0 ', judge the instruction is not processed, but still control the selector 617 to send the DBN of all '0' on the bus 821 to the data memory via the bus 861 113 to get the data into the buffer 849 For the memory core to be spared (in addition, it is not necessary to take data with the address of all '0' DBN to save power). At the same time, controller 803 enters the first stage mode and controls selector 817 to bus 821 The group number 623 on the group is sent to the sequence table 603, and the label stored in the entry No.
  • Adder 812 is added after shifting the block number in the group on bus 821.
  • the data address 641 generated by the processor core 101 and the output of the shift adder 812 are subtracted in the subtracter 613, and the difference is Difference ) is placed on bus 825.
  • the controller 803 obtains the difference from the bus 825 for analysis and judgment, so that the difference is not '0', and the controller judges to perform the operation on 821.
  • the data pointed to by the DBN is not the data required by the processor core 101, that is, the processor core is notified to ignore the corresponding data in the buffer 849, and wait for the correct data (in addition, this judgment may be omitted to save power).
  • Controller 803 controls the data address on bus 641 with sequence table 603 and label 841
  • the tags in the match are matched. If a match is matched in tag 841, it operates as a traditional cache. If all the tags do not match, the data address on bus 641 is sent to lower level memory via selector 819. 115. The corresponding data block is read from the lower layer memory 115. At this time, the tracker 845 I have seen the next branch point forward and judged that the branch is a reverse branch (that is, the program is a loop here), and calculated that the range contains the data read instruction being processed, then assign a data that can be replaced.
  • the label and index number portion on the bus 641 is stored in the sequence table 603 The label and index number area in the corresponding entry.
  • the group valid bits of this group and the valid bits of the corresponding data block (data block 0) are asserted.
  • the shift item in this entry is now all '0 ', the adjacent group number part has not yet had a value.
  • the address of the entry (i.e., the group number GN) is output from the sequence table 603 via the bus 835 to the bus 837.
  • the data address on the bus 641 is subtracted from the subtracted tag 613 by the just-input tag sent from the sequence table 603 via the bus 643, and the difference is placed on the intermediate result bus 823. Because of the bus The address high bits on 641 and bus 643 are the same, and the difference is now the tag low, index and intra-block offset. This low bit is also placed on the bus by the shifter 609 (the shift amount is '0' at this time).
  • the group number on it forms a complete and correct cache address, at which point the controller 803 controls the selector 617 to place the cache address on the bus 837 on the bus 855 and send it to the data store 113 It is pointed out that the correct data block is filled with the corresponding data block read from the lower layer memory 115.
  • the controller 803 also controls this data to be read from the data memory 113, or to control the data directly from the lower layer memory 115.
  • the output is bypassed to data buffer 849 for use by memory 101. Controller 803 then notifies processor core 101 that the correct data is available for use.
  • the controller 803 also controls the selector block 811, 813 and 815 to select the intra-block number on the bus 823 and the intra-block offset and the full '0' step from the track table in the adder 611 as in the embodiment of FIG. , put the result on bus 881.
  • the control line 631 generated by the addition result controls the current group number output by the selector 618 selection sequence table 603 to also be placed on the bus 881.
  • the group number, the block number within the group and the intra-block offset are spliced together on the bus 881 into a buffer address DBN.
  • controller 803 controls the selector 843 to select the bus 881, and the delay 847 delays the track table read address 851 and puts it on the track table write address 853, so that the DBN is written to the same entry previously read. Controller 803 does not update the step size (or force write '0'), still '0'.
  • the track table has the cache address of the read that the data read instruction has been completed in the entry, (hereinafter referred to as DBN 1 for explanation), and the step size is '0'.
  • the data access engine completes the first phase of the data read instruction.
  • the program in this example is executing a loop.
  • the type 621, DBN 1 and the data step '0' are read out onto the bus 821, and the track table address is also on the bus 851.
  • the controller 803 reads the track table read address 851 to match the address stored in the matching count group in the controller 803, and obtains a prompt to perform the second stage operation on the instruction, and the control logic 867 controls the data access engine via the control bus 827 to perform corresponding operating.
  • the controller 803 controls the group number (GN) of the DBN 1 on the bus 821.
  • the 623 selects the tag stored in the corresponding entry from the sequence table 603, and the block number in the group is sent to the selector 810 via the bus 643 and the DBN 1 from the bus 821.
  • the intra-group block number, the intra-block offset is added by the shift adder 812 (the amount of shift is controlled from the bus 829 output from the sequence table 603).
  • index number 895 in the sequence table 603 the intra-group block number and the intra-block offset (lower bit) on the bus 821 need to be shifted in the shift adder 812. It is then added to the tag index number (high) on bus 643.
  • the data address corresponding to DBN 1 is sent to an input of the subtractor 805.
  • the new data address on bus 641 is sent to the other input of subtractor 805 to be subtracted from the corresponding data address of DBN 1 , and the resulting difference is placed on bus 825 as a data step (stride).
  • the converter 807 converts the step size into a shift signal region in the corresponding entry of the DBN 1 in the sequence shift table 603 by the corresponding shift signal (shift).
  • the shift amount 829 is sent from the sequence table 603 to the shifters 605, 607, 609 and 812 to control the shift operation.
  • the controller 803 controls the selector 819 to select the data address on the bus 641 to read the corresponding data from the lower layer memory 115. At the same time, the lower address of the data address on the 641 and the corresponding DBN 1 entry on the 643 and the block number in the group are subtracted by the subtracter 613 are also placed on the bus 823.
  • the controller 803 controls the selectors 811, 813 to send the lower bits on the bus 823 (i.e., the tag low of the DBN 2 , the index and the intra-block offset) to the adder 611, etc., and add it to '0', and in sequence 603 The shift amount shift in the shift domain in the DBN 1 entry.
  • the sum of the block number and the intra-block offset of the DBN 2 is placed on the bus 881, and the result 631 is shifted out by the shifter 607.
  • the control selector 618 selects from the entry of the DBN 1 in the sequence table 603. Adjacent group number. If the group number is invalid, assign a new group to fill the DBN 2 data block according to the example of FIG. 7 and the first stage, and set its valid bit and label index number corresponding to DBN 2 , wherein the shift domain is DBN 1 Shift domain settings. In the process, the adjacent group number of the original invalid group number in DBN 1 will be filled in the group number of the newly assigned group and set to be valid, and read again.
  • the group number of DBN1 will also be filled in the adjacent group number corresponding to DBN2. If it is valid, the group number is read directly.
  • the group number is also placed on the bus 881 and the group block number on the bus 881, and the offset within the block is sent to the selector via the bus 816.
  • the 617 is selected and placed as the buffer address of the DBN 2 on the bus 855 and sent to the data storage 113.
  • the data from the lower layer memory 115 is filled, and the correct data is read from the address and sent to the buffer 849 for use by the processor core 101. Controller 803 then notifies processor core 101 that the correct data is available for use.
  • the controller 803 controls the selectors 811, 813 to send the lower bits on the bus 823 (i.e., the tag low of the DBN 2 , the index and the intra-block offset) to the adder 611 and the like, and add the data step size on the bus 825, and press The shift amount shift in the shift domain in the DBN 2 entry in the sequence table 603.
  • the sum of the new intra-group block number and the intra-block offset is placed on the bus 881, and the result 631 is shifted out by the shifter 607 to control the selector 618 to select an adjacent group number in the sequence table 603.
  • the adjacent group number is invalid, it indicates that the data block is not in the primary data storage 113, and at this time, a new data group is allocated according to the above example.
  • the group number and the added intra-group block number and the intra-block offset together form a cache address when the data read command is executed next time, hereinafter referred to as DBN 3 .
  • the controller 803 controls the data step on the DBN 3 and the bus 825 via the bus 881, and the selector 843 writes back the corresponding entry of the same data read command (previously stored DBN 1 ) in the track table 619.
  • the DBN by the controller 3 corresponding to the address data memory 115 from the lower filled DBN 113 fetch the data block pointed to a data memory 3 to prepare for the next read cycle the same instruction.
  • the group number, the intra-group block number, and the intra-block offset in the DBN 3 are sent to the selector 617 via the bus 816 to be selected and directed to the data in the primary data memory 113.
  • the intra-group block number and the intra-block offset are shifted by the shift adder 812 and then added to the tag index number on the bus 643 to obtain the correct data address.
  • the data address is selected by the selector 819 and sent to the lower layer memory 115 to fetch the data block in the primary data memory 113 pointed to by the DBN 3 .
  • controller 803 determines that the corresponding data read command has entered the third stage based on the track table address match.
  • the controller 803 controls the selector 617 to select the DBN 3 on the bus 821 to read the corresponding data from the primary data memory 113 via the bus 855, and the buffer 849 is used by the processor core 101.
  • the controller 803 also controls the corresponding data address of the DBN 3 to be compared with the data address sent by the processor core 101 via 641, adds the DBN 3 to the data step size to obtain the DBN 4 , and queries the sequence table 603 according to the DBN 4 , if necessary.
  • the corresponding data is retrieved from the lower layer memory 115 and stored in the primary data memory 113, as in the previous example, to be in the next cycle.
  • the subsequent loops are executed as such.
  • the data read instruction has a negative data step size, that is, data with a larger data address is read from a certain data address, and then the data address is read one by one smaller than the previous one.
  • the controller cannot determine in the first stage that the step size is negative, and the corresponding data of DBN 1 is arranged on the data block No. 0 in a certain group.
  • DBN 2 and DBN 1 are subtracted to obtain the data step size, and the data step size is found to be negative.
  • DBN 2 can be arranged in the highest data block of another data group, and the corresponding data address of DBN 2 is set .
  • Another way is to not allocate a new group, but directly store DBN 2 in the group where DBN 1 is located to save more cache space.
  • the method is to invert the block number of the group in the group, taking four data blocks in a group as an example. At this time, the block No. 0 stored in the original DBN 1 is mapped to block No. 3, and the original block No. 3 is mapped to block No. 0, the original block No. 1 is mapped to block No. 2, and the original block No. 2 is mapped to block No. 1.
  • the implementation manner is that an inverter is added to the route of the block number in the group, and the block number of the group output by the inverter is the block number of the group input by the inverter.
  • an inverted (R) bit is added under the feature of the sequence table 603.
  • the inverter does not work and the output is the same as the input.
  • the inverter acts and its output is the bitwise negation of the input. In this way, the data originally stored in the group in descending order is stored in the group in ascending order.
  • DBN 1 (by index should be 0) is now actually stored in block 0, but the cache address stored in the track table is marked as block 3;
  • DBN 2 (by index should be -1) is now actually stored Block 1, but the cache address stored in the track table is marked as block 2;
  • DBN 3 (by index should be -2) is now actually stored in block 2, but the cache address stored in the track table is marked as number 1 Block;
  • DBN 4 (by index should be -3) is now actually stored in block 3, but the cache address stored in the track table is marked as block 0.
  • the tag index bit of the group is set by DBN 1 being placed in block 0. Therefore, in the second stage, the step size is determined to be negative.
  • the R bit of the group is set to '1', and the label, index number field written by the group in the first stage is passed through the bus. 643 Read, subtract a constant and then write back the label, index number field.
  • This constant can be obtained by looking up a table or calculation. Let a data group have n data blocks, and the shift field in the order table 603 to be adjusted is s (read at the same time as the label and index number and sent to the bus 829), then the constant is equal to (n) -1 ) * ( s+1 ). For example, in the above example, 4 data blocks with a shift value of '0', the constant is equal to '3'.
  • DBN tag is, the index value 1 (in this case corresponds to the address in the mapping block 3 DBN 1) subtracting 3, precisely DBN. 4 (a case corresponding to map the block address is 0. 4 DBN
  • the tag index number value is 0.
  • the shift value is '1', and the constant is '6'. Others are deduced by analogy and will not be repeated.
  • Both the data address and the DBN stored in the track table use the correct address before the mapping, and only the cached address is sent to the primary data store.
  • the mapped address is required only, so the above-mentioned inverter can be placed after the selector 617 in Fig. 8A and only the block numbers in the group are inverted.
  • the DBN sent from the track table 619 When the data is fetched by the selector 617 to the primary data memory 113, the R bit is also read in the group number 623 to the sequence table 603 to control the inverter. As in the track table 619 The addition of the R bit in the data entry in the middle can eliminate the query for 603 at this time. However, in order to compare with the data address sent from the bus 641, the sequence table 603 must also be queried by the group number 623.
  • the tag index number field corresponding to the DBN is obtained from the bus 643.
  • the apparatus and method proposed by the present invention can be used in various data cache related applications, and the efficiency of the processor system can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A data caching system and method. When the system and the method are applied in the field of processors, before a processor executes a data reading instruction, data required by the command can be filled into a data memory, a possible data address when the command is executed next time is predicted and prefetched, and corresponding data is stored according to a rule, and the number of times of label comparison is reduced as much as possible.

Description

一种数据缓存系统和方法  Data cache system and method 技术领域Technical field
本发明涉及计算机,通讯及集成电路领域。 The invention relates to the field of computers, communications and integrated circuits.
背景技术Background technique
通常而言,缓存的作用是将内存中的一部分内容复制在其中,使这些内容能在短时间内由处理器核快速存取,以保证流水线的持续运行。 In general, the role of the cache is to copy a portion of the contents of the memory, so that the content can be quickly accessed by the processor core in a short time to ensure the continuous operation of the pipeline.
现行缓存的寻址都基于以下方式,首先用地址中的索引段寻址读出标签存储器中的标签。同时用地址中索引段与块内位移段共同寻址读出缓存中的内容。此外,将标签存储器中读出的标签与地址中的标签段进行匹配。如果从标签存储器中读出的标签与地址中的标签段相同,那么从缓存中读出的内容有效,称为缓存命中。否则,如果从标签存储器中读出的标签与地址中的标签段不相同,成为缓存缺失,从缓存中读出的内容无效。对于多路组相联的缓存,同时对各个路组并行进行上述操作,以检测哪个路组缓存命中。命中路组对应的读出内容为有效内容。若所有路组都为缺失,则所有读出内容都无效。缓存缺失之后,缓存控制逻辑将低级存储媒介中的内容填充到缓存中。 The addressing of the current cache is based on the following manner, first reading the tag in the tag memory with the index segment in the address. At the same time, the contents of the read buffer are addressed by the index segment in the address and the segment within the block. In addition, the tags read in the tag memory are matched to the tag segments in the address. If the tag read from the tag memory is the same as the tag segment in the address, then the content read from the cache is valid, called a cache hit. Otherwise, if the tag read from the tag memory is different from the tag segment in the address, the cache is missing, and the content read from the cache is invalid. For the cascading cache, the above operations are performed in parallel for each way group to detect which way group cache hits. The read content corresponding to the hit path group is valid content. If all the way groups are missing, all readings are invalid. After the cache is missing, the cache control logic populates the contents of the low-level storage medium into the cache.
技术问题technical problem
在现有缓存结构中,各种各样的缓存预取技术被用来减少缓存缺失的发生。对于指令缓存而言,预取技术能带来一定的性能提升。但是对于数据缓存,由于数据地址的不确定性,很难有效地对数据地址进行预测。因此,随着日渐扩大的处理器 / 存储器速度鸿沟,数据缓存缺失仍是制约现代处理器性能提升的最严重瓶颈。 In the existing cache structure, a variety of cache prefetch techniques are used to reduce the occurrence of cache misses. For instruction caching, prefetching technology can bring a certain performance boost. However, for data caching, it is difficult to effectively predict the data address due to the uncertainty of the data address. So with the growing processor / Memory speed gap, data cache loss is still the most serious bottleneck restricting the performance of modern processors.
此外,在现代处理器中最常用的组相联缓存结构中,通常路组数越多,缓存的性能越好,但需要同时读出、比较的标签也越多,导致功耗越高。如何能在增加路组的同时,减少标签比较的次数,是数据缓存改进中的难点之一。 In addition, in the most commonly used group associative cache structure in modern processors, the more the number of path groups, the better the performance of the cache, but the more tags that need to be read and compared at the same time, resulting in higher power consumption. How to reduce the number of label comparisons while increasing the way groups is one of the difficulties in data cache improvement.
技术解决方案Technical solution
本发明提出的方法与系统装置能直接解决上述或其他的一个或多个困难。  The method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.
本发明提出了一种数据缓存方法,其特征在于对缓存中的数据存储器进行配置,其中一部分存储块实现传统的组相联结构,另一部分存储块实现按组分配的结构;所述按组分配的缓存由多个组构成,每个组中存储对应同一个起始数据块地址的若干数据块,且组内各个相邻的存储块对应的数据地址之差为相同值。 The invention provides a data caching method, which is characterized in that a data storage in a cache is configured, wherein a part of the storage blocks implements a traditional group associative structure, and another part of the storage blocks realizes a structure allocated by groups; The cache is composed of a plurality of groups, each of which stores a plurality of data blocks corresponding to the same start data block address, and the difference between the data addresses corresponding to the adjacent storage blocks in the group is the same value.
可选的,每个组中的数据块对应的数据地址具有相同部分;所述相同部分由数据地址中的标签构成,或由数据地址中的标签的一部分和索引号的一部分构成;地址相邻或相近的数据块存储在同一个组中。 Optionally, the data address corresponding to the data block in each group has the same part; the same part is formed by a label in the data address, or is formed by a part of the label in the data address and a part of the index number; Or similar data blocks are stored in the same group.
可选的,当一个组中的各个相邻的存储块对应的数据地址之差等于数据块长度时,该组中的所有存储块中的数据块地址连续;当一个组中的各个相邻的存储块对应的数据地址之差等于数据块长度的整数倍时,该组中的所有存储块中的数据块地址的间隔相等;可以根据当前数据在该组中对应的位置以及数据步长,直接确定下一数据是否也位于该组中,以及当该下一数据位于该组中时的所在位置。 Optionally, when the difference between the data addresses corresponding to each adjacent storage block in a group is equal to the data block length, the data block addresses in all the storage blocks in the group are consecutive; when each adjacent one of the groups When the difference between the data addresses corresponding to the storage block is equal to an integer multiple of the length of the data block, the interval of the data block addresses in all the storage blocks in the group is equal; the current data may be directly in the corresponding position and the data step size in the group. Determine if the next data is also in the group and where the next data is located in the group.
可选的,提供一个顺序表;所述顺序表的行与数据存储器中的组一一对应;且所述顺序表的每一行中包含了一个压缩比例;所述压缩比例表示了相应组中各个相邻存储块对应的数据块地址的间隔值。 Optionally, a sequence table is provided; the rows of the sequence table are in one-to-one correspondence with the groups in the data storage; and each row of the sequence table includes a compression ratio; the compression ratio indicates each of the corresponding groups The interval value of the data block address corresponding to the adjacent memory block.
可选的,所述顺序表的每一行中包含了与相应组中数据块相邻的数据块所在的组的位置;可以根据当前数据在该组中对应的位置以及数据步长,直接确定下一数据所在的组及组中的位置。 Optionally, each row of the sequence table includes a location of a group in which the data block adjacent to the data block in the corresponding group is located; and the current data may be directly determined according to the corresponding position and the data step size in the group. The location in which the data resides and the location in the group.
可选的,所述顺序表的每一行中包含了与相应组中第一个数据块相邻的连续若干个数据块所在的组的位置。 Optionally, each row of the sequence table includes a location of a group of consecutive data blocks adjacent to the first data block in the corresponding group.
可选的,所述顺序表的每一行中包含了与相应组中最后一个数据块相邻的连续若干个数据块所在的组的位置。 Optionally, each row of the sequence table includes a location of a group of consecutive data blocks adjacent to a last data block in the corresponding group.
可选的,将数据地址转换为缓存地址;所述缓存地址由组号、组内块号和块内偏移量构成;其中块内偏移量与数据地址中的块内偏移量相同;所述缓存地址可以直接用于对数据缓存中的数据存储器寻址。 Optionally, the data address is converted into a cache address; the cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the offset within the block in the data address; The cache address can be used directly to address the data store in the data cache.
可选的,将循环代码中数据访问指令对应的数据存储在按组分配的结构中,其他数据访问指令对应的数据存储在组相联的结构中。 Optionally, the data corresponding to the data access instruction in the loop code is stored in a structure allocated by the group, and the data corresponding to the other data access instructions is stored in the group-associated structure.
可选的,对第一次执行到的数据访问指令,当其数据地址产生后被转换为缓存地址。 Optionally, the data access instruction that is executed for the first time is converted into a cache address when its data address is generated.
可选的,对第二次执行到的数据访问指令,当其数据地址产生后被转换为缓存地址,且计算得到数据步长;所述数据步长就是两次数据地址之差;根据本次缓存地址和数据步长计算出下次执行该数据访问指令时可能的下次缓存地址,供下次执行该数据访问指令是对数据存储器寻址;且当所述下次缓存地址对应的数据存储器中的数据无效时,将下次缓存地址转换为相应的数据地址,并将对应的数据填充到数据存储器中。 Optionally, the data access instruction executed for the second time is converted into a cache address when the data address is generated, and the data step size is calculated; the data step size is the difference between the two data addresses; The cache address and the data step size calculate the next possible cache address when the data access instruction is executed next time, and the next time the data access instruction is executed, the data memory is addressed; and when the next buffer address corresponds to the data memory When the data in the data is invalid, the next cache address is converted to the corresponding data address, and the corresponding data is filled into the data memory.
可选的,对第三次及以后执行到的数据访问指令,根据本次缓存地址和数据步长计算出下次缓存地址,供下次执行该数据访问指令是对数据存储器寻址;且当所述下次缓存地址对应的数据存储器中的数据无效时,将下次缓存地址转换为相应的数据地址,并将对应的数据填充到数据存储器中。 Optionally, for the data access instruction executed in the third time and later, the next cache address is calculated according to the current cache address and the data step size, and the next time the data access instruction is executed, the data memory is addressed; When the data in the data memory corresponding to the next cache address is invalid, the next cache address is converted into a corresponding data address, and the corresponding data is filled into the data memory.
本发明还提出一种数据缓存系统,其特征在于,所述数据缓存系统中的数据存储器可以根据配置,将其中一部分存储块作为传统的组相联结构运行,另一部分存储块作为按组分配的结构运行;所述按组分配的结构包含多个组,每个组包含若干个存储块及一个数据块地址存储单元,且该组中所有存储块都对应该数据块地址存储单元中的数据块地址;每个组内各个相邻的存储块对应的数据地址之差为相同值。 The present invention also provides a data caching system, wherein the data storage in the data caching system can operate a part of the storage blocks as a traditional group associative structure according to a configuration, and another part of the storage blocks are allocated as groups. The structure is operated; the structure allocated by the group comprises a plurality of groups, each group comprises a plurality of storage blocks and a data block address storage unit, and all the storage blocks in the group correspond to the data blocks in the data block address storage unit Address; the difference between the data addresses corresponding to each adjacent storage block in each group is the same value.
可选的,所述数据缓存系统还包含带掩码的比较器,所述比较器用于将数据地址中的一部分块地址与所述数据块地址存储单元中的数据块地址的相应位进行匹配,以确定该数据地址对应的数据是否存储在该组中。 Optionally, the data cache system further includes a masked comparator, wherein the comparator is configured to match a part of the block address in the data address with a corresponding bit of the data block address in the data block address storage unit, It is determined whether the data corresponding to the data address is stored in the group.
可选的,当一个组中的各个相邻的存储块对应的数据地址之差等于数据块长度时,该组中的所有存储块中的数据块地址连续;且当所述数据地址对应的数据存储在该组中时,由所述被掩码的位对该组中的存储块寻址,即可找到所述数据地址对应的数据。 Optionally, when a difference between data addresses corresponding to each adjacent storage block in a group is equal to a data block length, data block addresses in all the storage blocks in the group are consecutive; and when the data address corresponds to the data When stored in the group, the memory blocks in the group are addressed by the masked bits to find the data corresponding to the data address.
可选的,所述数据缓存系统还包括移位器;当一个组中的各个相邻的存储块对应的数据地址之差等于数据块长度的整数倍时,该组中的所有存储块中的数据块地址的间隔相等;且当所述数据地址对应的数据存储在该组中时,由所述移位器对所述被掩码的位移位后得到的值对该组中的存储块寻址,即可找到所述数据地址对应的数据。 Optionally, the data cache system further includes a shifter; when a difference between data addresses corresponding to each adjacent storage block in a group is equal to an integer multiple of a data block length, in all the storage blocks in the group The intervals of the data block addresses are equal; and when the data corresponding to the data address is stored in the group, the value obtained by the shifter after the masked bit shift is the memory block in the group Addressing, the data corresponding to the data address can be found.
可选的,所述数据缓存系统还包括顺序表存储器;所述顺序表存储器中的行与数据存储器中的组一一对应;且所述顺序表存储器的每一行中包含了一个用于存储压缩比例的存储单元;所述存储单元中存储的值表示了相应组中各个相邻存储块对应的数据块地址的间隔值。 Optionally, the data caching system further includes a sequence table memory; the rows in the sequence table memory are in one-to-one correspondence with the groups in the data storage; and each row of the sequence table memory includes one for storing compression a storage unit of a ratio; a value stored in the storage unit represents an interval value of a data block address corresponding to each adjacent storage block in the corresponding group.
可选的,所述顺序表存储器的每一行中包含了指向相应组中数据块相邻的数据块所在的组的位置的指针;可以根据当前数据在该组中对应的位置以及数据步长,直接确定下一数据所在的组及组中的位置。 Optionally, each row of the sequence table memory includes a pointer to a location of a group in which the data block adjacent to the data block in the corresponding group is located; and the corresponding position and the data step size in the group according to the current data. Directly determine the group in which the next data is located and the location in the group.
可选的,所述指针指向相应组中第一个数据块相邻的连续若干个数据块所在的组的位置。 Optionally, the pointer points to a location of a group of consecutive data blocks adjacent to the first data block in the corresponding group.
可选的,所述指针指向相应组中最后一个数据块相邻的连续若干个数据块所在的组的位置。 Optionally, the pointer points to a location of a group of consecutive data blocks adjacent to a last data block in the corresponding group.
可选的,由所述比较器对数据地址和数据块地址存储单元中的数据块地址匹配,并由移位器根据压缩比例存储单元中的值对数据地址中的索引号进行相应移位,可以将数据地址转换为缓存地址;所述缓存地址由组号、组内块号和块内偏移量构成;其中块内偏移量与数据地址中的块内偏移量相同;所述缓存地址可以直接用于对数据缓存中的数据存储器寻址。 Optionally, the data address and the data block address in the data block address storage unit are matched by the comparator, and the index is shifted by the shifter according to the value in the compressed ratio storage unit. The data address can be converted to a cache address; the cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the intra-block offset in the data address; The address can be used directly to address the data memory in the data cache.
可选的,根据缓存地址对应的数据块地址存储单元中的数据块地址值,并由移位器根据压缩比例存储单元中的值对缓存地址中的组内块号进行相应移位,可以将缓存地址转换为数据地址。 Optionally, according to the data block address value in the data block address storage unit corresponding to the cache address, and the shifter performs corresponding shift on the block number in the cache address according to the value in the compression ratio storage unit, The cache address is converted to a data address.
对于本领域专业人士,还可以在本发明的说明、权利要求和附图的启发下,理解、领会本发明所包含其他方面内容。 Other aspects of the present invention can be understood and appreciated by those skilled in the art in light of the description of the invention.
有益效果Beneficial effect
本发明所述系统和方法可以为数字系统使用的数据缓存结构提供基本的解决方案。与传统数据缓存系统仅在缓存缺失后才填充的机制不同,本发明所述的系统和方法在处理器访问一个数据之前就对数据缓存进行填充,可以避免或充分地隐藏强制缺失。这就是说,本发明所述缓存系统集成了预取过程。 The system and method of the present invention can provide a basic solution for the data cache structure used by digital systems. Unlike the traditional data caching system which only populates after the cache is missing, the system and method of the present invention fills the data cache before the processor accesses a data, and can avoid or sufficiently hide the mandatory deletion. That is to say, the cache system of the present invention integrates a prefetch process.
本发明所述的系统和方法还将数据缓存中的数据存储器分为组相联部分和按组分配部分。其中,按组分配部分中的每个组包含数据地址相邻或相近的数据块。这样,对于数据地址相邻或相近的数据访问指令(如循环代码中的数据访问指令)对应的数据被存储在所述按组分配的部分中,其他数据被存储在组相联部分中。同时,采用本发明技术方案在向数据缓存填充数据的同时,将传统的包含标签、索引号和块内偏移量的数据地址转换为组号、组内块号和块内偏移量,实现了地址空间的转换,使得数据缓存系统能直接根据该新的地址寻址方式,不必进行标签匹配,直接可以从数据存储器中找到相应数据,特别在对数据地址相邻或相近的数据进行访问时,可以对缓存地址和数据步长进行简单计算即可得到下一数据的数据地址,不必进行标签匹配和地址转换,大大降低功耗。 The system and method of the present invention also divides the data store in the data cache into a group association portion and a group assignment portion. Among them, each group in the group allocation part contains data blocks whose data addresses are adjacent or similar. Thus, data corresponding to data access instructions adjacent to or close to the data address (e.g., data access instructions in the loop code) are stored in the group-assigned portion, and other data is stored in the group associative portion. At the same time, the technical solution of the present invention converts the data address including the label, the index number and the intra-block offset into the group number, the group block number and the intra-block offset, while filling the data buffer with the data buffer. The conversion of the address space enables the data caching system to directly address the new address addressing mode without having to perform tag matching, and can directly find the corresponding data from the data memory, especially when accessing data adjacent or close to the data address. The buffer address and the data step size can be simply calculated to obtain the data address of the next data, without label matching and address conversion, which greatly reduces power consumption.
此外,本发明所述的系统和方法还可以在处理器核即将执行到数据读取指令之前,提前从数据存储器中读出该数据并送往处理器核供使用,使得处理器核在需要读取该数据时能直接取用,掩盖了访问数据存储器的时间。 In addition, the system and method of the present invention can read out the data from the data memory and send it to the processor core for use before the processor core is about to execute the data read instruction, so that the processor core needs to read. When the data is fetched, it can be taken directly, masking the time of accessing the data memory.
对于本领域专业人士而言,本发明的其他优点和应用是显见的。 Other advantages and applications of the present invention will be apparent to those skilled in the art.
附图说明DRAWINGS
图 1 是本发明所述的缓存系统的一个实施例;  1 is an embodiment of a cache system according to the present invention;
图 2 是本发明所述轨迹点格式的一个示意图; 2 is a schematic diagram of a track point format according to the present invention;
图 3A 是本发明所述的缓存系统的另一个实施例; Figure 3A is another embodiment of the cache system of the present invention;
图 3B 是本发明所述轨迹点格式的另一个示意图; 3B is another schematic diagram of the track point format of the present invention;
图 3C 是本发明所述的缓存系统的另一个实施例; 3C is another embodiment of the cache system of the present invention;
图 4A 是本发明所述改进的组相联缓存的一个实施例; 4A is an embodiment of the improved group associative cache of the present invention;
图 4B 是本发明所述改进的组相联缓存的另一个实施例; 4B is another embodiment of the improved group associative cache of the present invention;
图 5 是本发明所述分组的数据缓存的一个实施例; Figure 5 is an embodiment of a data buffer of the packet of the present invention;
图 6 是本发明所述数据访问引擎的一个实施例; Figure 6 is an embodiment of the data access engine of the present invention;
图 7A 是本发明所述顺序表和数据缓存的一个实施例; Figure 7A is an embodiment of the sequence table and data cache of the present invention;
图 7B 是本发明所述顺序表和数据缓存的另一个实施例; Figure 7B is another embodiment of the sequence table and data cache of the present invention;
图 7C 是本发明所述顺序表和数据缓存的另一个实施例; Figure 7C is another embodiment of the sequence table and data cache of the present invention;
图 7D 是本发明所述组边界不对齐的数据存储方式的一个实施例; 7D is an embodiment of a data storage manner in which the group boundaries are not aligned according to the present invention;
图 8A 是本发明所述数据访问引擎的一个实施例; Figure 8A is an embodiment of the data access engine of the present invention;
图 8B 是本发明所述各种地址形式的示意图; Figure 8B is a schematic diagram of various address forms of the present invention;
图 8C 是本发明所述顺序表操作的一个实施例; Figure 8C is an embodiment of the sequence table operation of the present invention;
图 8D 是本发明所述控制器的一个实施例。 Figure 8D is an embodiment of the controller of the present invention.
本发明的最佳实施方式BEST MODE FOR CARRYING OUT THE INVENTION
图 6 显示了本发明的最佳实施方式。 Figure 6 shows a preferred embodiment of the invention.
本发明的实施方式 Embodiments of the invention
以下结合附图和具体实施例对本发明提出的高性能缓存系统和方法作进一步详细说明。根据下面说明和权利要求书,本发明的优点和特征将更清楚。需说明的是,附图均采用非常简化的形式且均使用非精准的比例,仅用以方便、明晰地辅助说明本发明实施例的目的。 The high performance cache system and method proposed by the present invention are further described in detail below with reference to the accompanying drawings and specific embodiments. Advantages and features of the present invention will be apparent from the description and appended claims. It should be noted that the drawings are in a very simplified form and all use non-precise proportions, and are only for convenience and clarity to assist the purpose of the embodiments of the present invention.
需说明的是,为了清楚地说明本发明的内容,本发明特举多个实施例以进一步阐释本发明的不同实现方式,其中,该多个实施例是列举式并非穷举式。此外,为了说明的简洁,前实施例中已提及的内容往往在后实施例中予以省略,因此,后实施例中未提及的内容可相应参考前实施例。 It should be noted that the various embodiments of the present invention are further described to illustrate the various embodiments of the present invention in order to clearly illustrate the present invention. Further, for the sake of brevity of explanation, the contents already mentioned in the foregoing embodiment are often omitted in the latter embodiment, and therefore, contents not mentioned in the latter embodiment can be referred to the previous embodiment accordingly.
虽然该发明可以以多种形式的修改和替换来扩展,说明书中也列出了一些具体的实施图例并进行详细阐述。应当理解的是,发明者的出发点不是将该发明限于所阐述的特定实施例,正相反,发明者的出发点在于保护所有基于由本权利声明定义的精神或范围内进行的改进、等效转换和修改。同样的元器件号码可能被用于所有附图以代表相同的或类似的部分。 Although the invention may be modified in various forms of modifications and substitutions, some specific embodiments of the invention are set forth in the specification and detailed. It should be understood that the inventor's point of departure is not to limit the invention to the particular embodiments set forth, but the inventor's point of departure is to protect all improvements, equivalent transformations and modifications based on the spirit or scope defined by the claims. . The same component numbers may be used in all figures to represent the same or similar parts.
请参考图 1 ,其为本发明所述的缓存系统的一个实施例。如图 1 所示,所述数据缓存系统包含 处理器 101 、主动表 109 、标签存储器 127 、扫描器 111 、轨道表 107 、循迹器 119 、指令存储器 103 和数据存储器 113 。应当理解的是,这里列出各种部件的目的是为了便于描述,还可以包含其他部件,而某些部件可以被省略。这里的各种部件可以分布在多个系统中,可以是物理上存在的或是虚拟的,可以是硬件实现(如:集成电路)、软件实现或由硬件和软件组合实现。 Please refer to FIG. 1, which is an embodiment of a cache system according to the present invention. As shown in Figure 1, the data cache system includes a processor 101, Active Table 109, Tag Memory 127, Scanner 111, Track Table 107, Tracker 119, Instruction Memory 103, and Data Memory 113 . It should be understood that the various components listed herein are for ease of description and may include other components, and some components may be omitted. The various components herein may be distributed across multiple systems, either physically or virtually, and may be hardware implemented (eg, integrated circuits), implemented in software, or implemented in a combination of hardware and software.
在本发明中,处理器可以是包含指令缓存和数据缓存、能执行指令并对数据进行处理的处理单元,包括但不限于:通用处理器( General Processor )、中央处理器( CPU )、微控制器( MCU )、数字信号处理器( DSP )、图象处理器( GPU )、片上系统 (SOC) 、专用集成电路( ASIC )等。 In the present invention, the processor may be a processing unit including an instruction cache and a data cache, capable of executing instructions, and processing the data, including but not limited to: a general processor (General Processor ), central processing unit (CPU), microcontroller (MCU), digital signal processor (DSP), image processor (GPU), system on chip (SOC), ASIC (ASIC) )Wait.
在本发明中,存储器的层次指的是存储器与处理器 101 之间的接近程度。越接近处理器 101 的层次越高。此外,一个高层次的存储器(如指令存储器 103 和数据存储器 113 )通常比低层次的存储器速度快但容量小。'最接近处理器的存储器'指的是在存储层次中离处理器最近、通常也是速度最快的存储器,如本实施例中的指令存储器 103 和数据存储器 113 。此外,本发明中的各个层次的存储器具有包含关系,即层次较低的存储器含有层次较高的存储器中的全部存储内容。 In the present invention, the hierarchy of memory refers to the degree of proximity between the memory and the processor 101. The closer to the processor 101 The higher the level. In addition, a high level of memory (such as instruction memory 103 and data memory 113) ) Usually faster than low-level memory but small in size. 'Memory closest to the processor' refers to the memory that is closest to the processor, usually the fastest, in the storage hierarchy, such as the instruction memory 103 in this embodiment. And data memory 113. Furthermore, the memory of each level in the present invention has an inclusion relationship, that is, a memory having a lower level contains all the stored contents in a memory having a higher level.
在本发明中,分支指令指的是任何适当的能导致处理器 101 改变执行流(如:非按顺序执行一条指令)的指令形式。分支源指一条进行分支操作的指令(即:分支指令),分支源地址可以是分支指令本身的指令地址;分支目标指的是分支指令造成的分支转移所转向的目标指令,分支目标地址可以指当分支指令的分支转移成功发生时转移进入的地址,也就是分支目标指令的指令地址;数据读取指令指的是任何适当的能导致处理器 101 从存储器读数据的指令形式;所述数据读取指令的指令格式中通常包含基地址寄存器号和地址偏移量;数据读取指令所需的数据指的是处理器 101 执行数据读取指令时读的数据;数据读取指令的数据地址指的是处理器 101 执行数据读取指令进行读 / 写数据所用到的地址;当处理器核 101 执行一条数据读取指令时,可以用基地址加偏移量的方式计算数据地址;基地址寄存器更新指令指的是对数据读取指令可能用到的基地址寄存器中的任何一个的值进行更改的指令。当前指令可以指当前正在被处理器核执行或获取的指令;当前指令块可以指含有当前正被处理器执行的指令的指令块。 In the present invention, a branch instruction refers to any suitable one that can cause the processor 101. The form of the instruction that changes the execution flow (eg, an instruction that is not executed in order). The branch source refers to an instruction that performs branch operations (ie, branch instruction), the branch source address can be the instruction address of the branch instruction itself, the branch target refers to the target instruction that the branch instruction is caused by the branch instruction, and the branch target address can refer to The address that is transferred when the branch transfer of the branch instruction succeeds, that is, the instruction address of the branch target instruction; the data read instruction refers to any appropriate processor that can cause the processor An instruction form for reading data from a memory; the instruction format of the data read instruction generally includes a base address register number and an address offset; and data required for the data read instruction refers to the processor 101. The data read when the data read instruction is executed; the data address of the data read instruction refers to the address used by the processor 101 to execute the data read instruction for reading/writing data; when the processor core 101 When executing a data read instruction, the data address can be calculated by using the base address plus the offset; the base address register update instruction refers to changing the value of any of the base address registers that may be used by the data read instruction. Instructions. The current instruction may refer to an instruction that is currently being executed or fetched by the processor core; the current instruction block may refer to an instruction block containing an instruction currently being executed by the processor.
在本发明中,术语'填充'指的是在处理器执行到一条指令之前,提前从外部存储器取得相应指令或所需数据并存储到指令缓存或数据缓存。 In the present invention, the term 'fill' refers to prefetching corresponding instructions or required data from an external memory in advance and storing it in an instruction cache or data buffer before the processor executes an instruction.
在本发明中,轨道表 107 中的行与指令存储器 103 中的存储块一一对应。轨道表 107 含有复数个轨迹点。在此,一个轨迹点是轨道表 107 中的一个表项,可含有至少一条指令的信息,比如该指令的类型等。当一个轨迹点含有的信息表示该轨迹点至少对应一条分支指令时,该轨迹点为一个分支点,所述信息可以是分支目标地址等。轨迹点的循迹地址就是该轨迹点本身的轨道表地址,且所述循迹地址由一个行地址和一个列地址构成。轨迹点的循迹地址与该轨迹点所代表的指令的指令地址相对应,且对于分支点而言,每个分支点中含有该分支点所代表的分支指令的分支目标指令在轨道表 107 中的循迹地址,且该循迹地址与该分支目标指令的指令地址相对应。 In the present invention, the rows in the track table 107 are in one-to-one correspondence with the memory blocks in the instruction memory 103. Track table 107 Contains a plurality of track points. Here, one track point is the track table 107 An entry in the file may contain information about at least one instruction, such as the type of the instruction. When a track point contains information indicating that the track point corresponds to at least one branch instruction, the track point is a branch point, and the information may be a branch target address or the like. The tracking address of the track point is the track table address of the track point itself, and the tracking address is composed of one row address and one column address. The tracking address of the track point corresponds to the instruction address of the instruction represented by the track point, and for the branch point, the branch target instruction of each branch point containing the branch instruction represented by the branch point is in the track table. The tracking address in 107, and the tracking address corresponds to the instruction address of the branch target instruction.
在本实施例中,指令存储器 103 除了存储可能被处理器 101 执行的指令外,还存储有对应每条指令的指令类型信息,如该指令是否为数据读取指令的信息;所述指令类型信息还可以进一步表示对应指令是哪种类型的数据读取指令,从而包含了如何计算数据地址的信息,如基地址寄存器号及地址偏移量在该指令码中的位置信息等。 In the present embodiment, the instruction memory 103 may be stored by the processor 101 in addition to the memory. In addition to the executed instructions, the instruction type information corresponding to each instruction is stored, such as whether the instruction is information of the data read instruction; the instruction type information may further indicate which type of data read instruction the corresponding instruction is. This includes information on how to calculate the data address, such as the base address register number and the location information of the address offset in the instruction code.
为便于表示,可以用 BNX 表示分支点循迹地址中的行地址,即 BNX 对应指令所在存储块的位置(存储块的行号),而循迹地址中的列地址就对应分支指令在其所在存储块中的位置(偏移量)。相应地,每组 BNX 及列地址对应轨道表 107 中的一个分支点,即可以根据一组 BNX 及列地址从轨道表 107 中找到对应的分支点。 For ease of representation, BNX can be used to represent the row address in the branch point tracking address, ie BNX Corresponding to the location of the memory block where the instruction is located (the row number of the memory block), and the column address in the tracking address corresponds to the position (offset) of the branch instruction in its storage block. Correspondingly, each set of BNX and column address corresponds to the track table 107 A branch point in which the corresponding branch point can be found from the track table 107 based on a set of BNX and column addresses.
进一步地,轨道表 107 的分支点中还存储了以循迹地址形式表示的该分支指令的分支目标指令在指令存储器 103 中的位置信息。根据该循迹地址,可以在轨道表 107 中找到该分支目标指令对应的轨迹点的位置。即对于轨道表 107 的分支点而言,其轨道表地址就是其分支源地址对应的循迹地址,其轨道表内容中包含了其分支目标地址对应的循迹地址。 Further, the branch point of the track table 107 also stores a branch target instruction of the branch instruction expressed in the form of a tracking address in the instruction memory. Location information in 103. Based on the tracking address, the position of the track point corresponding to the branch target command can be found in the track table 107. That is, for the track table 107 For the branch point, the track table address is the track address corresponding to its branch source address, and the track table content contains the track address corresponding to its branch target address.
在本实施例中,主动表 109 中的表项与指令存储器 103 中的存储块一一对应,即可以与轨道表 107 中的行一一对应。主动表 109 中的每个表项指明了该主动表行对应的指令缓存存储块存储在指令存储器 103 中的位置,并形成了 BNX 与指令缓存存储块的对应关系。主动表 109 中的每个表项存储了一个指令缓存存储块的块地址。这样,当用一个指令地址在主动表 109 中进行匹配时,可以得到匹配成功项中存储的 BNX ,或得到匹配不成功的结果。 In this embodiment, the entries in the active table 109 are in one-to-one correspondence with the storage blocks in the instruction memory 103, that is, the track table can be The lines in 107 correspond one-to-one. Each entry in the active table 109 indicates where the instruction cache memory block corresponding to the active table row is stored in the instruction memory 103 and forms BNX. Correspondence with the instruction cache memory block. Each entry in the active table 109 stores the block address of an instruction cache block. Thus, when using an instruction address in the active table 109 When matching is performed, the BNX stored in the matching success item can be obtained, or the result that the matching is unsuccessful can be obtained.
数据存储器 113 中的每个存储块用一个存储块号 DBNX 表示。标签存储器 127 中的表项与数据存储器 113 中的存储块一一对应,每个表项存储了数据存储器 113 中对应存储块的块地址,并形成了数据块地址与数据缓存存储块号的对应关系。这样,当用一个数据地址在标签存储器 127 中进行匹配时,可以得到匹配成功项中存储的存储块号,或得到匹配不成功的结果。 Each memory block in data memory 113 is represented by a memory block number DBNX. Tag memory 127 The entries in the table are in one-to-one correspondence with the storage blocks in the data memory 113, and each entry stores the data storage 113 The block address corresponding to the storage block is formed, and the correspondence relationship between the data block address and the data cache storage block number is formed. Thus, when using a data address in the tag memory 127 When matching is performed, the storage block number stored in the matching success item can be obtained, or the result that the matching is unsuccessful can be obtained.
扫描器 111 对从外部存储器送往指令存储器 103 的指令进行审查,一旦发现某条指令是分支指令,即计算该分支指令的分支目标地址。对于直接分支指令,可以通过对该指令所在指令块的块地址、该指令在指令块中的偏移量和分支增量( Branch Offset )三者相加得到分支目标地址。对于间接分支指令,可以通过对相应基地址寄存器值和分支增量相加得到分支目标地址。所述指令块地址可以是从主动表 109 中读出并被直接送往扫描器 111 中加法器的。也可以在扫描器 111 中增加用于存储当前指令块地址的寄存器,这样就主动表 109 就不需要实时地送出指令块地址。 The scanner 111 is sent from the external memory to the instruction memory 103. The instruction is reviewed, and once an instruction is found to be a branch instruction, the branch target address of the branch instruction is calculated. For direct branch instructions, you can pass the block address of the instruction block where the instruction is located, the offset of the instruction in the instruction block, and the branch increment ( Branch Offset The three are added to get the branch target address. For indirect branch instructions, the branch target address can be obtained by adding the corresponding base address register value and branch increment. The instruction block address may be from the active list 109 Read in and sent directly to the adder in scanner 111. It is also possible to add a register for storing the current instruction block address in the scanner 111, so that the active table 109 It is not necessary to send the instruction block address in real time.
此外,当扫描器 111 发现某条指令是数据读取指令时,还可以计算该数据读取指令对应的数据地址。例如,将该数据读取指令用到的基地址寄存器值加上数据地址偏移量得到数据地址。在本发明中,将数据读取指令分为两类:数据地址确定的数据读取指令和数据地址不确定的数据读取指令。例如,对于通过对数据读取指令本身指令地址与数据地址偏移量(立即数)求和得到数据地址的数据读取指令,无论何时计算得到的数据地址都是正确的,因此可以归为数据地址确定的数据读取指令。又如,对于通过对基地址寄存器值与数据地址偏移量(立即数)求和得到数据地址的数据读取指令,如果在计算数据地址时,所述基地址寄存器值已更新完毕,则也可以归为数据地址确定的数据读取指令,否则归为数据地址不确定的数据读取指令。根据本发明技术方案,可以对这两种数据读取指令给予不同的指令类型以存储在轨道表 107 的相应轨迹点中。 In addition, when the scanner 111 When an instruction is found to be a data read instruction, the data address corresponding to the data read instruction can also be calculated. For example, the base address register value used for the data read instruction is added to the data address offset to obtain the data address. In the present invention, data read instructions are divided into two categories: data read instructions for data address determination and data read instructions for data address uncertainties. For example, for a data read instruction that obtains a data address by summing the data read instruction itself with an address address and a data address offset (immediate number), whenever the calculated data address is correct, it can be classified as Data read command determined by the data address. For another example, for a data read instruction that obtains a data address by summing a base address register value and a data address offset (immediate number), if the base address register value has been updated when the data address is calculated, It can be classified as a data read instruction determined by the data address, otherwise it is classified as a data read instruction whose data address is undefined. According to the technical solution of the present invention, different data types can be given to the two data read instructions to be stored in the track table. 107 in the corresponding track point.
可以将扫描器 111 计算得到的分支目标指令地址与主动表 109 中存储的存储块行地址匹配。若匹配成功,表示该分支目标指令已经存储在指令存储器 103 中,则主动表 109 输出该 BNX 送往轨道表 107 填入所述分支指令的相应表项。若匹配不成功,则表示该分支目标指令尚未存储在指令存储器 103 中,此时,将该分支目标指令地址送往外部存储器,同时在主动表 109 中分配一个表项存储对应块地址,并输出该 BNX 送往轨道表 107 填入所述分支指令的相应表项,并将外部存储器送来的对应指令块填充到指令存储器 103 中与该 BNX 对应的存储块中,同时在轨道表 107 中相应行中建立对应的轨道。对于该指令块中的分支指令,其分支目标指令地址经主动表 109 匹配输出一个 BNX ,而该分支目标指令在其指令块中的位置(即该分支目标指令地址的块内偏移量部分)就是对应的轨迹点列号,从而得到了对应该分支目标指令的循迹地址,并将该循迹地址作为分支点内容存储到所述分支指令对应的分支轨迹点中。此外,在扫描器 111 对指令块审查的过程中,可以发现数据读取指令,并将相应指令类型信息存储在轨道表 109 的对应轨迹点(即数据点)中,且计算该数据读取指令的数据地址并将该数据地址送往外部存储器获取包含对应数据在内的数据块。同时,在标签存储器 127 中分配一个可用表项,将所述数据块填充到数据存储器 113 对应存储块中,并输出该 DBNX 及所述数据在数据块中的偏移地址(即 DBNY )作为轨迹点内容存储在所述数据点中。如此,就可以在将指令块填充到指令存储器 103 中的同时,建立对应整个指令块的一条轨道。为了便于描述,在本说明书中,能直接对数据存储器进行寻址的地址被称为缓存地址,即缓存地址( DBN )由 DBNX 和 DBNY 组成。 The branch target instruction address and the active table that can be calculated by the scanner 111 The storage block row address stored in the match. If the match is successful, indicating that the branch target instruction has been stored in the instruction memory 103, the active table 109 outputs the BNX to the track table 107. Fill in the corresponding entry of the branch instruction. If the match is unsuccessful, it indicates that the branch target instruction has not been stored in the instruction memory 103. At this time, the branch target instruction address is sent to the external memory, and at the active table 109. Allocating an entry to store the corresponding block address, and outputting the BNX to the track table 107 to fill in the corresponding entry of the branch instruction, and filling the corresponding instruction block sent by the external memory into the instruction memory 103 and the In the memory block corresponding to BNX, the corresponding track is also created in the corresponding row in the track table 107. For the branch instruction in the instruction block, its branch target instruction address is matched to the active table 109 to output a BNX And the position of the branch target instruction in its instruction block (ie, the intra-block offset portion of the branch target instruction address) is the corresponding track point column number, thereby obtaining the tracking address corresponding to the branch target instruction, and The tracking address is stored as a branch point content in a branch track point corresponding to the branch instruction. Also, in the scanner 111 During the review of the instruction block, the data read instruction can be found and the corresponding instruction type information is stored in the track table. Corresponding track point (ie, data point), and calculating the data address of the data read command and sending the data address to the external memory to obtain a data block including the corresponding data. At the same time, in the tag memory 127 Allocating an available table entry, filling the data block into the corresponding storage block of the data storage 113, and outputting the DBNX and the offset address of the data in the data block (ie, DBNY The content as track points is stored in the data points. In this way, the instruction block can be filled into the instruction memory 103. At the same time, a track corresponding to the entire instruction block is established. For ease of description, in this specification, the address that can directly address the data memory is called the cache address, ie the cache address (DBN) is DBNX and DBNY composition.
在本发明中,循迹器 119 的读指针 121 可以从轨道表 107 中的当前指令对应的轨迹点开始不断移动直到指向第一个分支点。 此时读指针 121 的值即为分支源指令的循迹地址,其中包含了 BNX 和对应的分支点列号。根据该循迹地址可以从轨道表 107 读出该分支源指令的分支目标指令循迹地址。这样,循迹器 119 的读指针 121 从 轨道表 107 中处理器 101 当前执行的指令对应的轨迹点开始,提前移动到该轨迹点之后的第一个分支点,并可以根据该目标指令循迹地址从指令存储器 103 中找到该目标指令。在此移动过程中,当 读指针 121 经过数据点时,读出其中存储的缓存地址 DBN 送往数据存储器 113 读出对应数据并推送给处理器核 101 。这样,在所述当前指令和其后第一个分支点之间的所有数据读取指令对应的数据被依次推送往处理器核供读取。 In the present invention, the read pointer 121 of the tracker 119 can be from the track table 107. The track point corresponding to the current instruction in the beginning starts to move until it points to the first branch point. At this time, the value of the read pointer 121 is the tracking address of the branch source instruction, which contains BNX. And the corresponding branch point column number. Based on the tracking address, the branch target instruction tracking address of the branch source instruction can be read from the track table 107. Thus, the read pointer 121 of the tracker 119 is from the track table 107. The track point corresponding to the currently executed instruction of the processor 101 starts to advance to the first branch point after the track point, and can track the address from the instruction memory 103 according to the target instruction. Find the target instruction in . During this movement, when the read pointer 121 passes the data point, the buffer address stored therein is read out and sent to the data memory 113. The corresponding data is read and pushed to the processor core. 101. Thus, the data corresponding to all data read instructions between the current instruction and the first branch point thereafter is pushed sequentially to the processor core for reading.
请参考图 2 ,其为本发明所述轨迹点格式的一个示意图。其中,对于分支点,其格式包含指令类型 151 、分支目标指令对应的 BNX 153 和 BNY 155 。对于数据点,其格式包含指令类型 161 、相应数据在数据存储器 113 中的 DBNX 163 和 DBNY 165 。 Please refer to FIG. 2 , which is a schematic diagram of a track point format according to the present invention. Wherein, for branch points, the format contains the instruction type 151 BNX 153 and BNY 155 corresponding to the branch target instruction. For data points, the format contains the instruction type 161, the corresponding data in the data memory 113 of DBNX 163 and DBNY 165.
回到图 1 , 循迹器 119 的读指针 121 根据轨道表 107 中存储的分支点的位置,移动并指向处理器核 101 正在执行的指令之后的第一个分支点,并从该分支点中读出轨迹点内容,即分支目标轨迹点的位置信息 BNX 和 BNY 。若这个分支点对应的是间接分支指令,则还需要从主动表 109 中读出对应的分支目标指令块地址。 Returning to Figure 1, the read pointer 121 of the tracker 119 is based on the track table 107. The position of the branch point stored therein, moves and points to the first branch point after the instruction being executed by the processor core 101, and reads the track point content from the branch point, that is, the position information of the branch target track point BNX and BNY. If the branch point corresponds to an indirect branch instruction, the corresponding branch target instruction block address needs to be read from the active table 109.
处理器核 101 输出指令偏移地址(即指令地址中的偏移地址部分),从指令存储器 103 中由循迹器 119 读指针 121 指向的存储块中选出所需指令。当处理器核执行到该分支指令时,若分支转移没有发生( TAKEN 信号 123 为' 0 '),则继续输出新的指令偏移地址,读取并执行该分支指令之后的下一指令,同时循迹器 119 读指针 121 继续移动并指向下一个分支点,重复上述操作。若分支转移发生( TAKEN 信号 123 为' 1 '),且该分支指令是直接分支指令,则处理器核 101 可以直接执行已经准备好的所述分支目标指令。同时循迹器 119 读指针 121 的值被更新为所述 BNX 和 BNY ,即读指针 121 指向该分支目标指令对应的轨迹点,从该轨迹点开始移动并指向第一个分支点,若分支转移发生( TAKEN 信号 123 为' 1 '),且该分支指令是间接分支指令,则处理器核 101 输出实际目标指令地址中的块地址部分与之前从主动表 109 中读出的指令块地址进行匹配,如果匹配成功,则该目标指令是正确的,可供处理器核 101 直接读取执行;否则,将该实际目标指令地址送往外部存储器获取包含对应目标指令在内的指令块,并将该目标指令送往处理器核 101 执行。同时,在主动表 109 中分配一个可用表项,将所述指令块填充到指令存储器 103 的对应存储块中,并输出该 BNX 及所述目标指令在指令块中的偏移地址(即 BNY )作为轨迹点内容存储在所述分支点中。 同时,循迹器 119 读指针 121 的值被更新为所述 BNX 和 BNY ,即读指针 121 指向该分支目标指令对应的轨迹点,从该轨迹点开始移动并指向第一个分支点,重复上述操作。这样,即可在处理器核执行分支指令之前将下一指令和分支目标指令均准备好供处理器核 101 选用,从而避免了因缓存缺失造成的性能损失。 The processor core 101 outputs an instruction offset address (ie, an offset address portion in the instruction address) from the instruction memory 103 by the tracker 119 Read Pointer 121 Select the desired instruction from the pointed memory block. When the processor core executes the branch instruction, if the branch transfer does not occur (TAKEN signal 123 is '0' '), continue to output a new instruction offset address, read and execute the next instruction after the branch instruction, while the tracker 119 reads the pointer 121 Continue moving and pointing to the next branch point and repeat the above. If a branch transfer occurs (TAKEN signal 123 is '1') and the branch instruction is a direct branch instruction, processor core 101 The branch target instruction that has been prepared can be directly executed. At the same time, the value of the tracker 119 read pointer 121 is updated to the BNX and BNY, i.e., the read pointer 121. Point to the track point corresponding to the branch target instruction, start moving from the track point and point to the first branch point, if the branch transfer occurs (TAKEN signal 123 is '1 '), and the branch instruction is an indirect branch instruction, the processor core 101 outputs the block address portion in the actual target instruction address and the previous slave active table 109 The instruction block address read in the match is matched. If the match is successful, the target instruction is correct for the processor core. Direct read execution; otherwise, the actual target instruction address is sent to the external memory to acquire an instruction block containing the corresponding target instruction, and the target instruction is sent to the processor core 101 for execution. At the same time, in the active table 109 Allocating an available entry, filling the instruction block into a corresponding storage block of the instruction memory 103, and outputting the offset address of the BNX and the target instruction in the instruction block (ie, BNY) The content as the track point is stored in the branch point. At the same time, the value of the tracker 119 read pointer 121 is updated to the BNX and BNY, i.e., the read pointer 121. Point to the track point corresponding to the branch target instruction, start moving from the track point and point to the first branch point, and repeat the above operation. In this way, the next instruction and the branch target instruction can be prepared for the processor core before the processor core executes the branch instruction. 101 is chosen to avoid performance loss due to cache misses.
类似地,当循迹器 119 读指针 121 经过一个数据点时,根据该数据点中存储的 DBN 从数据存储器 113 中读出对应的数据。若这条数据读取指令是数据地址不确定的数据读取指令,则还需要从标签存储器 127 中读出对应的数据块地址。当处理器核 101 执行到该数据读取指令时,若这条数据读取指令是数据地址确定的数据读取指令,则处理器核 101 可以直接使用该数据。否则,处理器核 101 输出实际数据地址中的块地址与之前从标签存储器 127 中读出的数据块地址进行匹配,如果匹配成功,则该数据是正确的,可供处理器核 101 直接使用;否则,暂停处理器核 101 中的流水线、将该实际数据地址送往外部存储器获取包含对应数据在内的数据块,并将该数据送往处理器核 101 后恢复流水线。同时,在标签存储器 127 中分配一个可用表项,将所述数据块填充到数据存储器 113 对应存储块中,并输出该 DBNX 及所述数据在数据块中的偏移地址(即 DBNY )作为轨迹点内容存储在所述数据点中。 Similarly, when the tracker 119 reads the pointer 121 past a data point, based on the DBN stored in the data point. The corresponding data is read from the data memory 113. If the data read command is a data read command whose data address is undefined, the corresponding data block address needs to be read from the tag memory 127. Processor core When the data read instruction is executed, if the data read command is a data read command determined by the data address, the processor core 101 can directly use the data. Otherwise, the processor core 101 The block address in the output actual data address is matched with the data block address previously read from the tag memory 127. If the match is successful, the data is correct for the processor core. Used directly; otherwise, the pipeline in the processor core 101 is suspended, the actual data address is sent to the external memory to obtain a data block containing the corresponding data, and the data is sent to the processor core. After the recovery line. At the same time, an available entry is allocated in the tag memory 127, the data block is filled into the corresponding storage block of the data memory 113, and the DBNX is output. And an offset address (ie, DBNY) of the data in the data block is stored as the track point content in the data point.
这样,在处理器核 101 第一次执行数据读取指令前,该指令对应的可能数据已经准备好。若该数据是正确的,则完全避免了数据存储器 113 缺失造成的性能损失,并能部分或完全掩盖读取数据存储器 113 所需时间。即使该数据是错误的,处理器核 101 也能在不增加等待时间的情况下重新获取到正确的数据。 Thus, at the processor core 101 Before the data read instruction is executed for the first time, the possible data corresponding to the instruction is ready. If the data is correct, the data memory 113 is completely avoided. Loss of performance due to missing, and can partially or completely mask the time required to read data memory 113. Even if the data is wrong, processor core 101 It is also possible to reacquire the correct data without increasing the waiting time.
请参考图 3A ,其为本发明所述的缓存系统的另一个实施例。本实施例与图 1 实施例类似,差别在于增加了一个数据地址预测模块 301 ,且轨道表中的数据点格式中增加了一个步长位。 Please refer to FIG. 3A, which is another embodiment of the cache system according to the present invention. This embodiment and Figure 1 The embodiment is similar in that a data address prediction module 301 is added and a step size bit is added to the data point format in the track table.
请参考图 3B ,其为本发明所述轨迹点格式的另一个示意图。其中,分支点的格式依然包含指令类型 151 、分支目标指令对应的 BNX 153 和 BNY 155 。而数据点的格式则包含指令类型 161 、相应数据在数据存储器 113 中的 DBNX 163 、 DBNY 165 和数据步长 331 。所述数据步长 331 指的是该数据点对应的数据读取指令前后两次执行时对应的数据地址的差值,即当前数据地址减去上一次数据地址得到的值。根据所述数据步长,可以猜测性地计算下一次数据地址的可能值,即用当前数据地址加上所述数据步长得到下一次数据地址的可能值。 Please refer to FIG. 3B , which is another schematic diagram of the track point format according to the present invention. Among them, the format of the branch point still contains the instruction type 151 BNX 153 and BNY 155 corresponding to the branch target instruction. The format of the data point includes the instruction type 161, the corresponding data in the data memory 113 DBNX 163, DBNY 165 and data step size 331 . The data step size 331 Refers to the difference between the data address corresponding to the data read instruction corresponding to the two data operations before and after the data point, that is, the value obtained by subtracting the previous data address from the current data address. Based on the data step size, the possible value of the next data address can be guessed, that is, the current data address plus the data step size is used to obtain the possible value of the next data address.
回到图 3A ,在本实施例中,建立轨道及预取指令、数据的过程与图 1 实施例类似。不同之处在于,本实施例中的轨道表是经过压缩的轨道表。由于一个指令块中只有部分指令是分支指令或数据读取指令,因此,可以对轨道表 107 进行压缩,从而降低轨道表 107 对存储空间的需求。例如,压缩轨道表可以具有与原始轨道表相同的行,但列数少于原始轨道表,并由一个映射表存储压缩轨道表中的行与原始轨道表中的行之间的对应关系。其中,压缩轨道表中的每个表项均是一个分支点或数据点,且按对应的分支指令及数据读取指令在指令块中的先后顺序依次对应。而映射表中的表项与压缩轨道表中的分支点及数据点一一对应,且存储了对应分支点及数据点在指令块中的块内偏移量。这样,可以将一条分支指令或数据读取指令在其所在指令块中的块内偏移量经映射表转换为列地址后,根据该列地址找到压缩轨道表中由所述分支指令的 BNX 指向的行中找到对应的分支点,或根据该列地址找到压缩轨道表中由所述数据读取指令的 BNX 指向的行中找到对应的数据点;对于压缩轨道表中任何一个分支点或数据点,也可以在映射表的相应表项中找到该分支点或数据点对应的分支指令或数据读取指令的块内偏移量,并与所述分支点或数据点本身的 BNX 一同指向所述分支点或数据点在指令存储器 103 中对应的分支指令或数据读取指令。 Returning to FIG. 3A, in the embodiment, the process of establishing a track and prefetching instructions and data is shown in FIG. 1 The examples are similar. The difference is that the track table in this embodiment is a compressed track table. Since only some of the instructions in an instruction block are branch instructions or data read instructions, it is possible to track table 107. Compress to reduce the track table 107 The need for storage space. For example, the compressed track table may have the same row as the original track table, but the number of columns is less than the original track table, and a mapping table stores the correspondence between the rows in the compressed track table and the rows in the original track table. Each entry in the compressed track table is a branch point or a data point, and corresponds to the corresponding branch instruction and the data read instruction in the order of the instruction block. The entries in the mapping table are in one-to-one correspondence with the branch points and the data points in the compressed track table, and store the offsets of the corresponding branch points and data points in the instruction block. In this way, after a block instruction or a data read instruction is converted into a column address by the intra-block offset in the instruction block in which it is located, the branch instruction in the compressed track table is found according to the column address. Find the corresponding branch point in the row pointed to by BNX, or find the BNX in the compressed track table that is read by the data according to the column address. Find the corresponding data point in the pointed row; for any branch point or data point in the compressed track table, you can also find the branch instruction or data read instruction corresponding to the branch point or data point in the corresponding entry of the mapping table. The offset within the block, and with the branch point or data point itself BNX points to the corresponding branch instruction or data read instruction in the instruction memory 103 of the branch point or data point.
在采用本实施例所述的轨道表压缩技术后,压缩轨道表中的每个表项都是一个分支点或数据点。因此,当循迹器 119 读指针 121 指向的分支点的分支转移没有发生时,读指针 121 经增量器 134 增一后指向下一个轨迹点。若该轨迹点是分支点,则按前述方法读出分支目标指令并等待处理器核 101 送出的 TAKEN 信号。若该轨迹点是数据点,则按前述方法读出对应的数据并准备好供处理器核 101 使用。具体地,可以将所述数据存储到一个先入先出缓冲( FIFO )中,使得处理器核 101 能够按正确的顺序依次获取各条数据读取指令对应的数据。之后继续移动读指针 121 ,重复上述操作,直至指向一个分支点,并按前述方法读出分支目标指令并等待处理器核 101 送出的 TAKEN 信号。 After the track table compression technique described in this embodiment is adopted, each entry in the compressed track table is a branch point or a data point. Therefore, when the tracker 119 Read pointer 121 When the branch transfer of the pointed branch point does not occur, the read pointer 121 is incremented 134 After increasing one, point to the next track point. If the track point is a branch point, the branch target instruction is read as described above and waits for the TAKEN sent by the processor core 101. Signal. If the track point is a data point, the corresponding data is read as described above and is ready for use by processor core 101. Specifically, the data can be stored into a first in first out buffer (FIFO) The processor core 101 is enabled to sequentially acquire data corresponding to each data read instruction in the correct order. Continue to move the read pointer afterwards 121 The above operation is repeated until it points to a branch point, and the branch target instruction is read as described above and waits for the TAKEN signal sent from the processor core 101.
此外,在本实施例中,当轨道表中的一个数据点被第二次被 循迹器 119 读指针 121 指向时,读出的 DBNX 被送到标签存储器 127 以读出对应的数据块地址。所述数据块地址与读指针 121 读出的 DBNY 构成上次执行该数据点时的数据地址,并送到预测模块 301 暂存。这样,当处理器核 101 执行到该数据点时将本次数据地址送到预测模块 301 减去所述上次数据地址,得到数据步长。预测模块 301 输出所述数据步长存储回对应数据点中,并将该数据步长与所述本次数据地址相加得到预测的下次数据地址。之后,预测模块 301 将所述下次数据地址送往标签存储器 127 匹配。若匹配成功,则表示下次执行该数据点时的可能数据已经存储在数据存储器 113 中,匹配得到的 DBNX 和该下次数据地址中的偏移地址部分(即 DBNY )被存储回对应数据点中,从而完成数据点的更新。若匹配不成功,则表示下次执行该数据点时的可能数据尚未存储在数据存储器 113 中, 将该下次数据地址送往外部存储器获取包含对应数据在内的数据块。同时,在标签存储器 127 中分配一个可用表项,将所述数据块填充到数据存储器 113 对应存储块中,并输出该 DBNX 及所述数据在数据块中的偏移地址(即 DBNY )作为轨迹点内容存储在所述数据点中,从而完成数据点的更新。如此,当 循迹器 119 读指针 121 再次指向该数据点时,可以根据其中的 DBN 从数据存储器 113 中提前读出对应数据供处理器核 101 读取。之后的操作过程与之前实施例所述相同。 Further, in the present embodiment, when a data point in the track table is pointed by the tracker 119 read pointer 121 a second time, the readout is performed. DBNX is sent to tag memory 127 to read the corresponding block address. The data block address and the DBNY read by the read pointer 121 constitute the data address when the data point was last executed, and are sent to the prediction module. 301 temporary storage. Thus, when the processor core 101 executes the data point, the current data address is sent to the prediction module 301 minus the last data address to obtain the data step size. Prediction module 301 The data step is outputted back to the corresponding data point, and the data step is added to the current data address to obtain a predicted next data address. Thereafter, the prediction module 301 sends the next data address to the tag memory. 127 matches. If the match is successful, it means that the possible data when the data point is executed next time is already stored in the data memory 113, and the obtained DBNX and the offset address part in the next data address are matched (ie DBNY is stored back in the corresponding data point to complete the update of the data point. If the match is unsuccessful, it means that the possible data when the data point is executed next time is not yet stored in the data memory 113. The next data address is sent to the external memory to obtain a data block including the corresponding data. At the same time, an available entry is allocated in the tag memory 127, and the data block is filled into the data memory 113. Corresponding to the storage block, and outputting the DBNX and the offset address (ie, DBNY) of the data in the data block as the track point content is stored in the data point, thereby completing the update of the data point. So when the tracker 119 Read Pointer 121 When pointing to the data point again, the corresponding data can be read out from the data memory 113 in advance according to the DBN therein for the processor core. Read. The subsequent operation is the same as described in the previous embodiment.
这样,只要数据读取指令没有被替换出指令存储器 103 ,从处理器核 101 第三次执行该数据读取指令开始,可能数据均已经准备好。若该数据是正确的,则完全避免了数据缓存缺失造成的性能损失,并能部分或完全掩盖读取数据缓存所需时间。即使该数据是错误的,处理器核 101 也能在不增加等待时间的情况下重新获取到正确的数据。 Thus, as long as the data read command is not replaced by the instruction memory 103, the slave processor core 101 The third execution of the data read command begins, and the data may be ready. If the data is correct, the performance loss caused by the lack of data cache is completely avoided, and the time required to read the data cache can be partially or completely masked. Even if the data is wrong, the processor core 101 can also reacquire the correct data without increasing the waiting time.
需要说明的是,在本实施例中,由于循迹器 119 读指针 121 在移动到当前处理器核 101 正在执行的指令之后第一个分支点的过程中,可能经过多个数据点,并根据这些数据点中的 DBN 从数据存储器 113 中预先读出数据,因此采用了一个 FIFO 按顺序暂存各条数据读取指令对应的数据供处理器核 101 依次使用,即该 FIFO 用于存储处理器核 101 将要用到的数据。然而,也可以采用一个 FIFO 存储从这些数据点中读出的 DBN ,且只根据最早读出的 DBN 从数据存储器 113 中读出对应数据,并在处理器核 101 获取该数据后,再从 FIFO 中读出当时最早的 DBN 从数据存储器 113 中读出对应数据,以备处理器核 101 使用,即该 FIFO 用于存储处理器核 101 将要用到的数据对应的地址。此时,本发明所述缓存系统的其他操作过程与之前实施例所述相同,在此不再赘述。 It should be noted that in the present embodiment, since the tracker 119 reads the pointer 121 while moving to the current processor core 101 During the first branch point after the instruction being executed, multiple data points may pass, and data is read in advance from the data memory 113 based on the DBN in these data points, so a FIFO is used. The data corresponding to each data read instruction is temporarily stored in sequence for the processor core 101 to use sequentially, that is, the FIFO is used to store data to be used by the processor core 101. However, it is also possible to use a FIFO The DBN read from these data points is stored, and the corresponding data is read from the data memory 113 based only on the DBN that was read the earliest, and is acquired from the FIFO after the processor core 101 acquires the data. The first DBN read at that time reads the corresponding data from the data memory 113 for use by the processor core 101, that is, the FIFO is used to store the processor core. The address corresponding to the data to be used. At this time, other operations of the cache system of the present invention are the same as those described in the previous embodiments, and details are not described herein again.
请参考图 3C ,其为本发明所述的缓存系统的另一个实施例。本实施例与图 3A 实施例类似,差别在于增加了一个顺序表 361 。所述顺序表 361 的表项与标签存储器 127 的表项一一对应,其中存储了标签存储器 127 对表项中数据块地址的前一个数据块的位置信息 PREV 及后一个数据块的位置信息 NEXT 。例如,当按地址顺序向数据存储器 113 填入两个地址连续的数据块时,所述前一个数据块在顺序表 361 的对应表项中的 NEXT 中存储了所述下一个数据块的 DBNX 。而所述后一个数据块在顺序表 361 的对应表项中的 PREV 中存储了所述前一个数据块的 DBNX 。这样,根据顺序表 361 中记录的信息,可以直接找到预测得到的下次数据地址对应的 DBNX ,以减少在标签存储器 127 中的匹配次数。 Please refer to FIG. 3C, which is another embodiment of the cache system according to the present invention. This embodiment and FIG. 3A The embodiment is similar in that the difference is that a sequence table 361 is added. The entries of the sequence table 361 are in one-to-one correspondence with the entries of the tag memory 127, wherein the tag memory 127 is stored. The position information PREV of the previous data block of the data block address in the entry and the position information NEXT of the next data block. For example, when the address memory is directed to the data memory 113 When two consecutive data blocks are filled in, the previous data block stores the DBNX of the next data block in the NEXT in the corresponding entry of the sequence table 361. And the latter data block is in the sequence table The DBNX of the previous data block is stored in the PREV in the corresponding entry of 361. Thus, according to the information recorded in the sequence table 361, the predicted next data address can be directly found. DBNX to reduce the number of matches in tag memory 127.
具体地,假设一个数据块的长度为 N ,则下一数据块的块地址是当前数据块的块地址加 N ,而前一数据块的块地址是当前数据块的块地址减 N 。由于下次数据地址等于本次数据地址和数据步长之和,因此将数据步长与本次数据地址中偏移地址之和的绝对值除以 N ,就可以得到下次数据地址与本次数据地址间隔的数据块的数目。同时,根据数据步长的符号,即可确定下次数据地址是指向本次数据地址之前的数据块还是指向本次数据地址之后的数据块。 Specifically, assuming that the length of one data block is N, the block address of the next data block is the block address of the current data block plus N. And the block address of the previous data block is the block address of the current data block minus N. Since the next data address is equal to the sum of the current data address and the data step, the absolute value of the sum of the data step and the offset address in the current data address is divided by N. , you can get the number of data blocks between the next data address and the current data address. At the same time, according to the symbol of the data step, it can be determined whether the next data address is a data block before the current data address or a data block after the current data address.
具体地,当数据步长与本次数据地址中偏移地址之和小于 N 且大于等于' 0 '时,所述下次数据地址与本次数据地址位于同一个数据块中,即下次数据地址的 DBNX 与本次数据地址的 DBNX 相同。 Specifically, when the data step size and the offset address in the current data address are less than N and greater than or equal to '0 ', the next data address is in the same data block as the current data address, that is, the DBNX of the next data address is the same as the DBNX of the current data address.
当数据步长与本次数据地址中偏移地址之和小于' 0 '时,所述下次数据地址位于本次数据地址之前的数据块中;当数据步长与本次数据地址中偏移地址之和大于等于 N 时,所述下次数据地址位于本次数据地址之后的数据块中。对于这两种情况,下次数据地址与本次数据地址之间数据块的间隔数等于将数据步长与本次数据地址中偏移地址之和的绝对值除以 N 得到的商。这样,只要顺序表 361 中记录了足够的信息,就可以从本次数据地址对应的表项开始,沿 PREV (或 NEXT )所给出的 DBNX ,逐一经过向前(或向后)的各个相邻数据块,找到下次数据地址对应的 DBNX 。 When the sum of the data step size and the offset address in the current data address is less than '0 ', the next data address is located in the data block before the current data address; when the sum of the data step size and the offset address in the current data address is greater than or equal to N The next data address is located in a data block after the current data address. For both cases, the number of blocks between the next data address and the current data address is equal to the absolute value of the sum of the data step and the offset address in the current data address. N get the quotient. Thus, as long as sufficient information is recorded in the sequence table 361, the DBNX given along the PREV (or NEXT) can be started from the entry corresponding to the current data address. , one by one through the forward (or backward) of each adjacent data block, find the DBNX corresponding to the next data address.
特别地,在很多类型的循环代码中,数据步长的绝对值很小,下次数据地址往往指向本次数据地址的前一个(或后一个)数据块。在这种情况下,所述本次数据地址对应的顺序表 361 表项(即顺序表 361 中由循迹器 119 读指针 121 从数据点中读出的 DBNX 指向的表项)中的 PREV (或 NEXT )中存储的 DBNX 就是所述下次数据地址对应的 DBNX 。这样,可以直接从顺序表 361 中读出所述 DBNX 存储回轨道表 107 中,从而避免了下次数据地址在标签存储器 127 中的匹配。 In particular, in many types of loop code, the absolute value of the data step size is small, and the next data address tends to point to the previous (or next) data block of the current data address. In this case, the sequence table corresponding to the current data address The 361 entry (that is, the sequence table 361 is stored in the PREV (or NEXT) in the table pointed to by the tracker 119 read pointer 121 read from the data point) DBNX is the DBNX corresponding to the next data address. Thus, the DBNX storage back track table can be read directly from the sequence table 361 107 Medium, thereby avoiding the matching of the next data address in the tag memory 127.
此外,可以用经过改进的数据缓存结构以获得更好的性能提升。在本说明书中将对以组相联( way-set associative )为基础的缓存的改进进行说明。对于直接映射( direct mapping )的缓存,可以将其视为组相联缓存的一个路组( way-set ),以同样方式实现,在此不再具体说明。而对于全相联( fully associative )缓存,每一个存储块之间的地址是可以完全没有联系的,因此可以直接用图 3C 实施例中的顺序表构成存储块与存储块之间的联系,使得能根据本次数据地址以及数据步长直接找到下次数据地址对应的存储块位置(即 DBN )。 In addition, an improved data cache structure can be used for better performance gains. In this specification, pairs are grouped together (way-set Associative) illustrates the improvement of the underlying cache. For direct mapping caches, you can think of them as a way group for group associative caches ( Way-set ), implemented in the same way, will not be specified here. Fully associative ) Cache, the address between each memory block can be completely unconnected, so you can directly use Figure 3C The sequence table in the embodiment constitutes a connection between the storage block and the storage block, so that the storage block position (ie, DBN) corresponding to the next data address can be directly found according to the current data address and the data step.
在传统的组相联缓存结构中数据地址分为三个部分:标签( TAG )、索引号( index )和块内偏移量( offset ),而每个路组中存储块的索引号是连续的,即在任何一个路组中每个索引号都存在且只存在一次。此时,可以采用本发明所述方法,给予每个路组中所有存储块相同的标签。又由于该路组中所有存储块的索引号连续,因此存储了连续地址的数据块。这样,就自然形成了对应连续地址的存储块之间的位置联系,即在一个路组的范围内,数据地址连续的数据块对应的物理位置(或索引号)也连续,从而可以直接找到预测得到的下次数据地址对应的 DBNX ,以减少在标签存储器 127 中的匹配次数或按序逐个查找顺序表花费的时延。 In the traditional group associative cache structure, the data address is divided into three parts: label (TAG), index number (index And the offset within the block (offset ), and the index number of the storage block in each way group is continuous, that is, each index number exists in any one of the way groups and exists only once. At this point, the method of the present invention can be used to give the same tag to all memory blocks in each way group. Moreover, since the index numbers of all the storage blocks in the path group are consecutive, the data blocks of consecutive addresses are stored. In this way, the positional relationship between the storage blocks corresponding to the consecutive addresses is naturally formed, that is, within the range of one way group, the physical position (or index number) corresponding to the data blocks consecutive to the data address is also continuous, so that the prediction can be directly found. Corresponding to the next data address DBNX to reduce the number of matches in tag memory 127 or the latency of looking up the sequence table one by one.
然而,在某些程序(如对数组的循环访问)中,所用到的数据地址并不是连续的,而是以等差数列形式出现,因此每个路组中很多索引号对应的数据可能是始终不会被访问的。而一旦频繁被访问的数据集中在几个索引号时,就会因路组不够而发生替换,降低缓存系统的性能。根据本发明技术方案,可以通过对每个路组设置一个压缩比例,使得该路组中的索引号不再按一递增,而是按一个常量等差递增,从而使得整个路组中绝大多数的数据都是会被访问的数据,在依然具备数据连续性的情况下,尽可能地提高了该路组的利用率。 However, in some programs (such as circular access to arrays), the data addresses used are not contiguous, but appear as an arithmetic progression, so the data corresponding to many index numbers in each way group may be always Will not be accessed. Once the frequently accessed data is concentrated in several index numbers, it will be replaced due to insufficient path groups, which will reduce the performance of the cache system. According to the technical solution of the present invention, a compression ratio can be set for each road group, so that the index numbers in the road group are no longer incremented by one, but are incremented by a constant, so that the vast majority of the entire road group The data is the data that will be accessed, and the utilization of the way group is improved as much as possible while still having data continuity.
请参考图 4A ,其为本发明所述改进的组相联缓存的一个实施例。在本实施例中缓存的每一个路组都对应一个特征表项,其中存放了一个压缩比例和若干指针。在此,定义压缩比例的值等于该路组中两个连续的存储块对应的数据块地址之差除以数据块长度得到的值。所述若干个指针则指向了该路组中第一个数据块(即数据地址最小的数据块)连续地址的后若干个数据块分别所在的路组。对于本发明所述所有存储块对应同一标签的路组,由于两个连续的存储块对应的数据块地址之差等于数据块长度,因此压缩比例为' 1 '。而所述指针都指向该路组本身,即该路组中第一个数据块的连续地址的后若干个数据块都在本路组中。在此,数据地址对应的 DBNX 由路组号和路组内存储块号构成。以某一个路组包含 4 个存储块为例,假设该路组的路组号为' 3 ',这 4 个存储块的组内块号分别为' 0 '到' 3 ',则它们对应的 DBNX 分别为' 30 '到' 33 '。如图 4A 中路组 401 所示,所有存储块均对应标签' 2001 ',即这 4 个存储块对应的数据块地址分别是' 20010 '、' 20011 '、' 20012 '、' 20013 '。这样,各个数据地址的索引号部分等于该路组中对应存储块的组内块号值。如:数据块地址' 20010 '的索引号为' 0 ',对应的存储块的组内号也为' 0 ';数据块地址' 20011 '的索引号为' 1 ',对应的存储块的组内块号也为' 1 '等等。此时如果每次访问的数据步长小于或等于一个数据块的长度,则可以根据本次数据地址对应的存储块位置(即 DBNX )及所述数据步长直接计算得到下次数据地址对应存储块就是该存储块或其相邻的下一个存储块。下次数据地址对应的 DBNX 等于本次数据地址对应的 DBNX 加上 DBNX 增量,而 DBNX 增量就是数据步长与数据块长度相除得到的商。例如,若本次数据地址对应的 DBNX 为' 32 '(对应的数据块地址为' 20012 '),而数据步长等于一个数据块的长度,则 DBNX 增量等于' 1 ',而下次数据地址对应的 DBNX 等于' 32 '加上' 1 ',即得到' 33 '(对应的数据块地址为' 20013 '),从而指向正确的存储块。由此,不需要计算下次数据地址及进行地址匹配,即可得到下次数据地址对应的 DBNX 值。 Please refer to Figure 4A It is an embodiment of the improved group associative cache of the present invention. Each path group cached in this embodiment corresponds to a feature entry in which a compression ratio and a number of pointers are stored. Here, the value of the compression ratio is defined to be equal to the difference between the data block addresses corresponding to two consecutive memory blocks in the way group divided by the data block length. The plurality of pointers point to the path group in which the last data blocks of the consecutive addresses of the first data block (ie, the data block with the smallest data address) are respectively located in the path group. For all the storage blocks corresponding to the same label in the present invention, since the difference between the data block addresses corresponding to two consecutive storage blocks is equal to the data block length, the compression ratio is ' 1 '. The pointers all point to the way group itself, that is, the last several data blocks of the consecutive addresses of the first data block in the way group are in the local path group. Here, the DBNX corresponding to the data address It consists of the road group number and the storage block number in the road group. For example, if a road group contains 4 memory blocks, the road group number of the road group is '3', and the block numbers of the 4 memory blocks are '0' to '3 respectively. ', then their corresponding DBNX are '30' to '33'. As shown in the road group 401 in Fig. 4A, all the storage blocks correspond to the label '2001', that is, 4 The data block addresses corresponding to the memory blocks are '20010', '20011', '20012', '20013 '. Thus, the index number portion of each data address is equal to the intra-group block number value of the corresponding memory block in the way group. For example, the index number of the data block address '20010' is '0', and the corresponding internal number of the storage block is also '0'. ';Data block address '20011 ' has an index number of ' 1 ', and the corresponding block number of the corresponding block is also ' 1 'and many more. At this time, if the data step size of each access is less than or equal to the length of one data block, the storage block position corresponding to the current data address may be used (ie, DBNX). And the data step is directly calculated to obtain the next data address corresponding to the memory block is the memory block or its next adjacent memory block. The DBNX corresponding to the next data address is equal to the DBNX plus for this data address. DBNX increments, which are the quotient of the data step size divided by the data block length. For example, if the DBNX corresponding to this data address is '32' (the corresponding data block address is '20012) '), and the data step size is equal to the length of one data block, then the DBNX increment is equal to '1', and the DBNX of the next data address is equal to '32' plus '1', ie get '33 '(The corresponding data block address is '20013'), thus pointing to the correct memory block. Therefore, the DBNX corresponding to the next data address can be obtained without calculating the next data address and performing address matching. Value.
但是,如果每次访问的数据步长等于两个数据块的长度,则按这种方式存储会导致该路组中一半数据块不会被访问到,浪费了存储空间。对于这种情况,可以将压缩比例设为' 2 ',即该路组中相邻两个存储块对应的数据地址之差除以数据块长度等于' 2 '。请参考图 4B ,其为本发明所述改进的组相联缓存的另一个实施例。如图 4B 中路组 403 所示,所有存储块均对应标签' 2001 ',但对应的数据块地址分别是' 20010 '、' 20012 '、' 20014 '、' 20016 '。这样,各个数据地址的索引号部分等于该路组中对应存储块的组内块号值乘以压缩比例。如:数据块地址' 20010 '的索引号为' 0 ',对应的存储块的组内块号为' 0 ';数据块地址' 20012 '的索引号为' 2 ',对应的存储块的组内块号为' 1 '等等,使得索引号按压缩比例压缩。在这种情况下, DBNX 增量等于数据步长与数据块长度相除,再除以压缩比例得到的商。例如,假设本次数据地址对应的 DBNX 为' 31 '(对应的数据块地址为' 20012 '),而数据步长等于两个数据块的长度,则 DBNX 增量等于' 2 '除以' 1 '再除以' 1 '(即等于' 1 '),而下次数据地址对应的 DBNX 等于' 31 '加上' 1 ',即得到' 32 '(对应的数据块地址为' 20014 '),从而指向正确的存储块,并避免了数据地址的计算和匹配。 However, if the data step size of each access is equal to the length of two data blocks, storing in this way will result in half of the data blocks in the way group being not accessed, wasting storage space. For this case, you can set the compression ratio to ' 2 ', that is, the difference between the data addresses corresponding to two adjacent storage blocks in the path group divided by the data block length equal to '2'. Please refer to FIG. 4B, which is another embodiment of the improved group associative cache of the present invention. As shown in Figure 4B As shown in 403, all memory blocks correspond to the label '2001', but the corresponding data block addresses are '20010', '20012', '20014', '20016 '. Thus, the index number portion of each data address is equal to the intra-group block number value of the corresponding memory block in the way group multiplied by the compression ratio. For example, the index number of the data block address '20010' is '0'. ', the block number of the corresponding memory block is '0'; the index number of the data block address '20012' is '2', and the block number of the corresponding memory block is '1' 'Wait, so that the index number is compressed in a compression ratio. In this case, the DBNX increment is equal to the quotient of the data step divided by the data block length divided by the compression ratio. For example, suppose DBNX corresponds to this data address. For ' 31 ' (the corresponding data block address is ' 20012 '), and the data step size is equal to the length of two data blocks, then the DBNX increment is equal to ' 2 ' divided by ' 1 ' and divided by ' 1 '(ie equal to ' 1 '), and the DBNX of the next data address is equal to ' 31 ' plus ' 1 ', which gives ' 32 ' (the corresponding data block address is ' 20014 '), which points to the correct memory block and avoids the calculation and matching of data addresses.
在本实施例中,每个路组对应的所述特征表项中除了存储了压缩比例 419 外,还存储了若干指针,所述指针的数目等于压缩比例值乘以' 2 '。以路组 403 为例,其对应的特征表项中除了存储了压缩比例' 2 '以外,还存储了四个指针。其中三个指针分别指向与路组 403 中第一个数据块(数据块地址为' 20010 ')地址相邻的 3 个数据块(即数据块地址为' 2000E '、' 2000F '和' 20011 '的数据块)所在的路组,另一个指针则指向与路组 403 地址相邻的后一个路组(起始数据块地址为' 20018 ')。这样,当数据步长较小时,只需将本次数据地址对应的 DBN 与数据步长相加并根据压缩比例移位,即可在当前路组或所述指针指向的路组中找到下次数据地址对应的存储块。 In this embodiment, in addition to storing the compression ratio 419 in the feature table corresponding to each way group. In addition, a number of pointers are stored, the number of which is equal to the compression ratio value multiplied by '2'. Taking the road group 403 as an example, the corresponding feature table item stores the compression ratio '2 In addition to the four pointers are stored. Three of the pointers point to three data blocks adjacent to the address of the first data block (data block address '20010 ') in the way group 403 (ie, the data block address is ' The path group in which the 2000E ', '2000F ', and '20011' data blocks are located, the other pointer points to the next way group adjacent to the path group 403 address (the starting data block address is ' 20018 '). In this way, when the data step size is small, only the DBN corresponding to the current data address is needed. Adding to the data step size and shifting according to the compression ratio, the memory block corresponding to the next data address can be found in the current way group or the path group pointed by the pointer.
为了便于描述,以下以数据步长是数据块长度整数倍的情况进行说明,此时,每次数据地址对应的 DBNY 不变。当数据步长不是数据块长度整数倍时,多余的部分需要和 DBNY 相加,其结果的和部分成为新的 DBNY ,而进位部分被加到 DBNX 上。假设本次数据地址对应的 DBNX 为' 31 ',则当数据步长为 3 个数据块长度(即 DBNX 增量为' 3 ',下次数据地址为' 20015 ')时,先根据压缩比例和存储块的组内块号对数据地址的索引号进行复原。对于该 DBNX ,存储块的组内块号为' 1 ',乘以压缩比例后得到' 2 '(即数据块的索引号)。再将这个' 2 '与 DBNX 增量' 3 '相加,得到下次数据地址索引号' 5 '。之后,对该下次数据地址索引号' 5 '按压缩比例进行压缩,即' 5 '除以' 2 '得到商为' 2 ',余数为' 1 '。因此,所述下次数据地址对应的数据就位于余数对应的指针 417 指向的路组中以所述商为组内块号的存储块中,即下次数据地址' 20015 '对应的数据在路组 405 内组内块号为' 2 '的存储块 421 中。 For convenience of description, the following describes the case where the data step size is an integer multiple of the data block length. At this time, each time the data address corresponds to DBNY No change. When the data step size is not an integer multiple of the data block length, the extra part needs to be added to DBNY, the sum of the result becomes the new DBNY, and the carry part is added to DBNX. On. Assume that the DBNX corresponding to this data address is '31', then the data step size is 3 data block lengths (that is, the DBNX increment is '3', and the next data address is '20015'. When '), the index number of the data address is first restored according to the compression ratio and the block number of the block of the storage block. For this DBNX, the block number of the block in the block is '1', multiplied by the compression ratio to get '2' '(ie the index number of the data block). Add this ' 2 ' to the DBNX increment ' 3 ' to get the next data address index number ' 5 '. After that, the next data address index number '5 'Compress by compression ratio, ie ' 5 ' divided by ' 2 ' to get the quotient ' 2 ', the remainder is ' 1 '. Therefore, the data corresponding to the next data address is located in the pointer corresponding to the remainder 417 The data in the pointed path group in which the quotient is the block number in the group, that is, the data corresponding to the next data address '20015' is in the storage block 421 in the group group 405 whose block number is '2'.
同理,当数据步长(或 DBNX 增量)为负数时,可以用同样的方法在指针 411 或 413 指向的路组中找到对应的存储块;当数据步长(或 DBNX 增量)为正的偶数且恰好超出路组 403 的范围时,则可以在指针 415 指向的路组中找到对应的存储块。当下次数据地址超出所述四个指针指向的路组的范围时,可以依次通过相应路组的特征表项中存储的指针信息,找到所述下次数据地址对应的路组及存储块。此外,对于更大的压缩比例,也可以按相同方法实现,在此不再赘述。 Similarly, when the data step size (or DBNX increment) is negative, you can use the same method in pointer 411 or 413. The corresponding storage block is found in the pointed way group; when the data step (or DBNX increment) is a positive even number and just exceeds the range of the way group 403, it can be in the pointer 415 Find the corresponding storage block in the pointed path group. When the next data address exceeds the range of the path group pointed by the four pointers, the path group and the storage block corresponding to the next data address may be found through the pointer information stored in the feature table item of the corresponding road group. In addition, for a larger compression ratio, the same method can also be implemented, and details are not described herein again.
根据本发明技术方案,还可以对组相联缓存中的路组进行改进,使得每一个路组可以被配置为多个组(group),而每个组能提供与路组相同的功能,从而方便地增加路组数,能够存储多组不同标签对应的连续数据块。 According to the technical solution of the present invention, it is also possible to improve the path groups in the group associative cache, so that each road group can be configured as a plurality of groups, and each group can provide the same function as the road group, thereby It is convenient to increase the number of road groups and to store multiple sets of consecutive data blocks corresponding to different labels.
例如,可以将每个路组中的数据存储器被分为相应的若干组,每组对应连续索引号的相同数目行,且对应同一个标签。即每个组中存储了对应同一个标签的连续地址的若干个数据块。 For example, the data store in each way group can be divided into corresponding groups, each group corresponding to the same number of rows of consecutive index numbers, and corresponding to the same tag. That is, several data blocks corresponding to consecutive addresses of the same tag are stored in each group.
请参考图 5 ,其为本发明所述分组的数据缓存的一个实施例。以一个路组为例,存储器 501 被分为两个组,每组含一行内容寻址存储器( CAM ),即存储一个标签(如标签 503 和标签 505 )。相应地,数据存储器 511 也被分为两个组,每个组含四个存储块,且这四个存储块中的数据块地址连续,并对应同一个标签。具体地,组 513 中包含存储块 521 、 523 、 525 和 527 ,这四个存储块中的数据块地址连续,且均对应标签 503 ;组 515 中包含存储块 531 、 533 、 535 和 537 ,这四个存储块中的数据块地址连续,且均对应标签 505 。在本实施例中,每组标签及相应组存储块还对应一个寄存比较器和一个译码器。如标签 503 对应寄存比较器 517 和译码器 529 ,标签 505 对应寄存比较器 519 和译码器 539 。所述寄存比较器中包含一个寄存器和一个比较器。其中,所述寄存器存储了该组存储内数据块起始地址中的索引号的高位部分。 Please refer to FIG. 5, which is an embodiment of the data buffer of the packet according to the present invention. Take a road group as an example, the memory 501 It is divided into two groups, each of which contains a row of Content Addressable Memory (CAM), that is, a tag (such as Tag 503 and Tag 505). Accordingly, the data memory 511 It is also divided into two groups, each group containing four storage blocks, and the data block addresses in the four storage blocks are consecutive and correspond to the same label. Specifically, group 513 includes storage blocks 521, 523, 525 and 527, the data block addresses in the four storage blocks are consecutive, and both correspond to the label 503; the group 515 includes the storage blocks 531, 533, 535, and 537 The data block addresses in the four memory blocks are consecutive and correspond to the label 505. In this embodiment, each set of tags and corresponding sets of memory blocks also correspond to a register comparator and a decoder. Such as label 503 corresponding register comparator 517 and decoder 529, tag 505 corresponds to register comparator 519 and decoder 539 . The register comparator includes a register and a comparator. The register stores the upper part of the index number in the start address of the data block in the set of stores.
根据数据地址寻址时,数据地址中的索引号高位部分通过总线 543 送到所有寄存比较器中与存储的索引号高位部分值进行比较,并根据比较结果,只对匹配成功项对应的内容寻址存储行的比较线进行充电、与经总线 541 送来的标签匹配,并由匹配成功的内容寻址存储行向译码器输出使能信号。所述译码器则在寄存比较器输出的使能信号控制下,对总线 545 上的数据地址中索引号的低位部分进行译码,并根据译码结果从相应组数据块中选出一个输出。这样,通过寄存比较器和译码器的匹配、译码及寻址,即可从数据存储器 511 中读出索引号与数据寻址地址中索引号相同的数据块。若所有比较器均匹配不成功,或所有参与匹配的内容寻址存储器均匹配不成功时,则说明所述数据地址对应的数据尚未存储在缓存的该路组中。按同样方法并行对所有路组进行如上操作,即可在缓存中找到所需数据,或得到缓存缺失的结果。这样,每个组就能提供相当于一个路组的功能。 When addressing according to the data address, the upper part of the index number in the data address passes through the bus 543 And sent to all registered comparators to compare with the stored index number upper part value, and according to the comparison result, only the comparison line of the content addressing storage line corresponding to the matching success item is charged, and the bus 541 is charged. The sent tags match and the enable address is output to the decoder by the successfully addressed content addressed memory line. The decoder is connected to the bus 545 under the control of the enable signal of the register comparator output. The lower part of the index number in the upper data address is decoded, and an output is selected from the corresponding group of data blocks according to the decoding result. Thus, from the data memory 511 by registering the matching, decoding and addressing of the comparator and the decoder. The data block whose index number is the same as the index number in the data addressed address is read out. If all the comparators match unsuccessfully, or all the content-addressed memories participating in the match are unsuccessful, the data corresponding to the data address is not yet stored in the cached way group. By doing the same operation on all the way groups in parallel in the same way, you can find the required data in the cache, or get the result of the cache miss. In this way, each group can provide the equivalent of a road group.
在本实施例中,只需要对寄存比较器中的寄存器存入相应的索引号高位值,即可对缓存进行重新分组以得到不同数目个组,每个组均能提供相当于一个路组的功能。例如,可以在两个相邻的寄存比较器中存入连续的索引号高位值,使得这两个寄存比较器对应的索引号也连续。这样,所述相邻的两个组就被合并为一个更大的组以容纳连续地址的数据块。 In this embodiment, it is only necessary to store the corresponding index number high value in the register in the register comparator, so that the buffer can be regrouped to obtain different numbers of groups, each group can provide the equivalent of one road group. Features. For example, consecutive index number high value values may be stored in two adjacent register comparators such that the index numbers corresponding to the two register comparators are also continuous. Thus, the adjacent two groups are merged into one larger group to accommodate the data blocks of consecutive addresses.
此外,在本发明中,还可以将各个组配置成不同大小,形成混合结构的缓存。例如可以将缓存中的一个路组配置成四个组、并将另一个路组配置成一个组,以这两个路组构成连续位置存储的缓存部分;将其他路组配置成传统形式的组相联结构,构成随机位置存储的缓存部分。在这种情况下,所述第一个路组中最多包含四组连续的数据块,而第二个路组中只包含一组连续的数据块。其余路组则如现有的组相联缓存那样,各自可包含的最多标签数等于对应的存储块的数量(及路组本身的行数),相邻的存储块可以对应不同的标签。采用如此配置的缓存,可以根据程序的特点,将数据地址连续(即标签相同)的数据存储在连续位置存储的缓存部分中。对于数据地址不连续的数据,则被存储在随机位置存储的缓存部分中。这样,所述混合结构的缓存可以根据程序特点进行配置,既具备了缓存中数据存放的灵活性和便于替换的特点,又能在进行连续地址的数据访问时省去大量的标签比较操作。 Further, in the present invention, each group can also be configured to be different in size to form a cache of the hybrid structure. For example, one road group in the cache may be configured into four groups, and the other road group may be configured into one group, and the two road groups constitute a cache portion of the continuous location storage; the other road groups are configured into a traditional form group. The associative structure constitutes the cache portion of the random location storage. In this case, the first road group contains a maximum of four consecutive data blocks, and the second road group contains only one continuous data block. The remaining path groups, like the existing group associative cache, may each contain a maximum number of tags equal to the number of corresponding memory blocks (and the number of rows of the path group itself), and adjacent memory blocks may correspond to different tags. With the cache thus configured, data of consecutive data addresses (ie, the same tags) can be stored in the cache portion of the continuous location storage according to the characteristics of the program. For data with discontinuous data addresses, it is stored in the cache portion of the random location storage. In this way, the cache of the hybrid structure can be configured according to the characteristics of the program, which has the flexibility of data storage in the cache and the convenience of replacement, and can save a large number of label comparison operations when performing data access of consecutive addresses.
需要说明的是,采用上述混合结构的缓存在实际运行时,有时会发现当前或即将访问的数据应该属于连续位置存储的缓存部分,但其所在的数据块已经存储在随机位置存储的缓存部分中。此时,应将该数据所在的数据块填充到连续位置存储的缓存部分中,并将随机位置存储的缓存部分中的相应存储块置为无效。又有时会发现即将访问的数据应该属于随机位置存储的缓存部分,但其所在的数据块已经存储在连续位置存储的缓存部分中。此时,则不改变数据在缓存中存储的位置,而直接通过标签比较的方法,从所述连续位置存储的缓存部分中读出该数据。 It should be noted that, in the actual running of the cache of the above-mentioned hybrid structure, it is sometimes found that the current or upcoming data should belong to the cache portion of the continuous location storage, but the data block in which it is stored has been stored in the cache portion of the random location storage. . At this time, the data block in which the data is located should be filled into the cache portion of the continuous location storage, and the corresponding storage block in the cache portion of the random location storage is invalidated. It is sometimes found that the data to be accessed should belong to the cache portion of the random location store, but the block in which it resides is already stored in the cache portion of the contiguous location store. At this time, the data is stored in the cache portion stored in the continuous position without directly changing the position where the data is stored in the cache.
在本发明中,数据访问引擎被用来实现以下功能。即,在处理器核计算得到数据地址之前,所述数据访问引擎将相应的数据填充到数据缓存中并将该数据准备好供处理器核使用。在本说明书中,以数据读取为例进行说明,对于数据存储,也可以用类似方法实现,在此不再重复说明。 In the present invention, a data access engine is used to implement the following functions. That is, before the processor core calculates the data address, the data access engine populates the corresponding data into the data cache and prepares the data for use by the processor core. In the present specification, the data reading is taken as an example for description, and the data storage may be implemented by a similar method, and the description will not be repeated here.
以下通过几个具体的例子对数据访问引擎进行详细说明。请参考图 6 ,其为本发明所述数据访问引擎的一个实施例。为了便于描述,图 6 中只显示了部分模块或部件。在图 6 中,数据存储器 113 和处理器核 101 与之前实施例所述相同。轨道表 107 中的数据点格式包含指令类型 621 、 DBNX 、 DBNY 627 和数据步长 629 。其中 DBNX 由组号( GN ) 623 和组内块号 625 构成, DBNY 627 就是数据地址中的块内偏移量( offset )。数据引擎 601 包含顺序表 603 、移位器 605 、 607 和 609 、加法器 611 、减法器 613 和选择器 615 、 616 、 617 。 The data access engine is described in detail below through several specific examples. Please refer to Figure 6 It is an embodiment of the data access engine of the present invention. For ease of description, only some of the modules or components are shown in Figure 6. In Figure 6, data memory 113 and processor core 101 The same as described in the previous embodiment. The data point format in track table 107 contains instruction types 621 , DBNX , DBNY 627 , and data step size 629 . Where DBNX consists of a group number ( GN ) 623 and the block number 625 in the group, DBNY 627 is the intra-block offset (offset) in the data address. Data Engine 601 contains sequence table 603, shifter 605, 607 and 609, adder 611, subtractor 613 and selectors 615, 616, 617.
在本实施例中,从轨道表中读出的数据点内容中的组内块号 625 被送到移位器 605 根据压缩比例左移后再送往加法器 611 。由于对组内块号 625 左移 n 位就相当于组内块号 625 乘以 2n ,因此经移位器 605 移位后,组内块号 625 就被恢复为相应的数据地址中的索引号的值。此外,数据点内容中的 DBNY 627 被直接送到加法器 611 ,与移位器 605 输出的索引号一同构成加法器 611 的一个输入,而数据点内容中的数据步长 629 是加法器 611 的另一个输入,两者相加得到的和就是下次数据地址中的索引号和块内偏移量。所述块内偏移量直接作为下次数据地址对应的 DBNY ,而所述索引号则根据压缩比例经移位器 607 右移后成为下次数据地址对应的组内块号。在此,移位器 607 右移的位数与移位器 605 左移的位数相同,对所述数据地址中的索引号右移 n 位就相当于索引号除以 2n ,因此经移位器 607 移位后,所述数据地址中的索引号再次被压缩为对应的组内块号送回轨道表存储,而其中的最低 n 位被向右移出部分 631 不作为组内块号的一部分。 In the present embodiment, the intra-group block number 625 in the data point contents read from the track table is sent to the shifter 605. Move left to the adder 611 according to the compression ratio. Since the shift of the block number 625 to the left by n bits is equivalent to the block number 625 multiplied by 2n in the group, the block number in the group after shifting by the shifter 605 625 is restored to the value of the index number in the corresponding data address. In addition, DBNY 627 in the data point contents is sent directly to adder 611, with shifter 605 The output index number together constitutes an input to adder 611, and the data step size 629 in the data point content is adder 611 The other input, the sum of the two is the index number and the intra-block offset in the next data address. The intra-block offset is directly used as the DBNY corresponding to the next data address, and the index number is shifted by the shifter according to the compression ratio. 607 Right shifts to the block number corresponding to the next data address. Here, the number of bits shifted by the shifter 607 to the right is the same as the number of bits shifted to the left by the shifter 605, and the index number in the data address is shifted right by n The bit is equivalent to the index number divided by 2n, so after the shifter 607 is shifted, the index number in the data address is again compressed into the corresponding intra-group block number and sent back to the track table storage, and the lowest n bits thereof Moved out to the right 631 is not part of the block number within the group.
在此过程中,被移位器 607 右移出的索引号部分被送到选择器 616 作为控制信号,而加法器 611 的溢出信号(进位或借位)被送到选择器 615 作为控制信号。选择器的各个输入均来源于顺序表 603 中由本次数据地址中的组号 623 指向的组号 GN 。 In the process, the portion of the index number shifted out by the shifter 607 is sent to the selector 616 as a control signal, and the adder The overflow signal (carry or borrow) of 611 is sent to selector 615 as a control signal. Each input of the selector is derived from the group number pointed to by the group number 623 in the current data address in the sequence table 603 GN .
请参考图 7A ,其为本发明所述顺序表和数据缓存的一个实施例。顺序表 603 的行数与数据存储器 701 中的组数相同,且两者一一对应。在本实施例中,数据存储器 701 分为两个路组(即路组 0 和路组 1 ),每个路组又可以被分为两个组。因此,数据存储器 701 中共有四个组,组号如图 7A 所示分别标注在对应的组上,即路组 0 包含组 00 和组 01 ,路组 1 包含组 10 和组 11 。此外,为了便于说明,假设每组包含四个存储块,每个存储块包含四个数据(或数据字)。 Please refer to FIG. 7A, which is an embodiment of the sequence table and data cache of the present invention. Number of rows and data memory of sequence table 603 The number of groups in 701 is the same, and the two correspond one-to-one. In the present embodiment, the data memory 701 is divided into two road groups (i.e., road group 0 and road group 1), and each road group can be further divided into two groups. Therefore, data storage There are four groups in 701. The group numbers are marked on the corresponding groups as shown in Figure 7A, that is, road group 0 contains group 00 and group 01, and road group 1 contains group 10 and group 11 . Further, for convenience of explanation, it is assumed that each group contains four memory blocks, each of which contains four data (or data words).
相应地,在顺序表 603 中也有四行,从上向下分别对应组 00 、 01 、 10 和 11 。每行中包含了特征表项、标签表项 715 和索引号表项 717 。其中特征表项又包含压缩比例 703 和五个指针(即指针 705 、 707 、 709 、 711 和 713 )。所述五个指针如图 4B 实施例特征表项中的指针,指向与该组中第一个数据块地址相邻的各个数据块所在的组。在本实施例中,各个组内的数据块的索引号没有被压缩,因此,除了一个指针指向该组的连续地址前一组,及另一个指针指向该组的连续地址后一组之外,其它三个指针均指向本组。如图 7A 顺序表 603 中第一行(对应组' 00 ')所示,指针 705 、 707 和 709 均指向该组本身(即组' 00 '),指针 711 指向该组的连续地址后一组(即组' 10 '),指针 711 指向该组的连续地址前一组(即组' 11 ')。其他各行中的指针也如图所示,其中内容为空的指针表示其指向的组没有在图中显示,或尚未确定,与本实施例描述的情况无关。 Accordingly, there are also four rows in the sequence table 603, corresponding to groups 00, 01, 10, and 11 from top to bottom. . Each row contains a feature entry, a tag entry 715, and an index entry 717. The feature table item further includes a compression ratio 703 and five pointers (ie, pointers 705, 707, 709, 711 and 713). The five pointers are shown in Figure 4B. A pointer in an embodiment feature entry points to a group in which each data block adjacent to the first data block address in the group is located. In this embodiment, the index numbers of the data blocks in each group are not compressed, and therefore, except that one pointer points to the previous group of consecutive addresses of the group, and the other pointer points to the subsequent group of consecutive addresses of the group, The other three pointers point to this group. As shown In the first row of the 7A sequence table 603 (corresponding to the group '00'), the pointers 705, 707, and 709 point to the group itself (ie, group '00'), pointer 711 Pointing to the next group of consecutive addresses of the group (ie group '10'), pointer 711 points to the previous group of consecutive addresses of the group (ie group '11 '). The pointers in the other lines are also as shown, wherein the pointer whose content is empty indicates that the group to which it is pointed is not displayed in the figure, or has not been determined, regardless of the case described in the embodiment.
在本实施例中,这四个组的压缩比例都是' 0 ',即数据地址的索引号与组内块号对应,且每个组对应一个完整的标签。此时,在根据数据地址查找对应的组时,直接屏蔽( mask )两位索引号(如图 7A 中索引号 717 中的下划线所示),单对数据地址中标签进行匹配,即可找到该数据地址对应的组;而被屏蔽的那两位就是该数据地址在该组中对应的组内块号。 In this embodiment, the compression ratios of the four groups are '0. ', that is, the index number of the data address corresponds to the block number in the group, and each group corresponds to a complete label. At this time, when searching for the corresponding group according to the data address, directly masking the two index numbers (such as the index number in FIG. 7A) As shown by the underscore in 717, the pair of data addresses can be matched to find the group corresponding to the data address; and the two bits that are masked are the block numbers of the corresponding group in the group.
随着数据地址的递增,如图 7A 中所示依次访问连续的四个数据 A 、 B 、 C 和 D 。即,数据 A 和 B 是组' 11 '的最后一个存储块的最后两个数据,而数据 C 和 D 是组' 00 '的第一个存储块的开头两个数据,即这四个数据的数据地址之差就是数据步长' 1 '。如之前实施例所述,在根据轨道表 619 中数据点内容从数据存储器 701 中获取数据 A 的过程中,该数据点中的 DBNX 、 DBNY 和数据步长均被读出,且 DBNX 的值为' 1111 '(即组' 11 '中的第 4 个存储块),其中组号为' 11 ',组内块号为' 11 '; DBNY 的值为' 10 '(即存储块中的第 3 个数据);数据步长的值为' 1 '(即下一次访问的数据 B 是数据 A 的后一个数据)。 As the data address is incremented, successive four data A, B, C, and D are sequentially accessed as shown in FIG. 7A. . That is, data A and B are the last two data of the last memory block of group '11', while data C and D are group '00 The first two data of the first memory block, that is, the difference between the data addresses of the four data is the data step '1'. As described in the previous embodiment, the data point contents are from the data memory 701 according to the track table 619. In the process of obtaining data A, the DBNX, DBNY and data step sizes in the data point are read out, and the value of DBNX is '1111' (ie 4th in group '11' Storage block), where the group number is '11', the block number in the group is '11'; the value of DBNY is '10' (ie the third data in the storage block); the value of the data step is '1' '(ie the next access data B is the last data of data A).
根据本发明技术方案,该 DBNX 中的组内块号(' 11 ')则被送到移位器 605 。该 DBNX 中的组号被送到顺序表 603 读出对应行(即顺序表 603 中的第四行)中的内容。其中,压缩比例(' 0 ')被送到移位器 605 和 607 作为移位位数(即不移位)。移位器 605 的输出' 11 '与 DBNY (' 10 ')一同构成' 1110 '和数据步长' 1 '相加得到' 1111 ',其中组内块号' 11 '经移位器 607 输出后依然为' 11 ',即得到下次数据地址对应的组内块号(' 11 ')和 DBNY (' 11 ')。 According to the technical solution of the present invention, the intra-group block number ('11') in the DBNX is sent to the shifter 605. The The group number in DBNX is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the fourth line in the sequence table 603). Where the compression ratio (' 0 ') is sent to the shifters 605 and 607 As the number of shift bits (ie no shift). The output '11' of shifter 605 is combined with DBNY (' 10 ') to form '1110' and data step '1' to get '1111 ', where the block number '11' in the group is still '11' after the output of the shifter 607, that is, the block number ('11') and DBNY ('11) corresponding to the next data address are obtained. ').
同时,顺序表 603 中的第四行的各个指针值分别对应端口' 1 '、' 2 '、' 3 '、' 4 '和' -1 '输出分别送往选择器 616 和 615 ,而端口' 0 '输出该行本身对应的组号' 11 '(该组号即对应行号,因此可以不用在该行中占用可写存储器,而是以硬连线编码只读方式以节省存储空间)至选择器 615 。由于加法器 611 没有溢出(即相加时没有进位),因此选择端口' 0 '输出的组号' 11 '作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 11 '和组内块号' 11 ')、 DBNY (' 11 ')均产生完毕,且指向数据存储器 701 中数据 B 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 B 使用。 Meanwhile, the respective pointer values of the fourth row in the sequence table 603 correspond to the ports '1', '2', '3', '4, respectively. The 'and' -1 ' outputs are sent to selectors 616 and 615 respectively, and the port ' 0 ' outputs the corresponding group number of the line ' 11 '(This group number corresponds to the line number, so it is not necessary to occupy the writable memory in the line, but to hard-code the read-only way to save storage space) to the selector 615. Due to the adder 611 There is no overflow (that is, there is no carry when adding), so the group number '11' outputted by port '0' is selected as the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '11 'and the block number '11') and DBNY ('11') are generated and point to data B in data memory 701. The DBN is written back to the track table via bus 649 619 Within this data point, it is used for the next read of data B.
又如,在按本发明技术方案获取数据 B 的过程中,该数据点中的组号' 11 '、组内块号' 11 '、 DBNY ' 11 '和数据步长' 1 '再次被读出。其中组内块号' 11 '经移位器 605 输出后与 DBNY 一同构成' 1111 '和数据步长' 1 '相加得到' 0000 '(即下次数据地址对应的组内块号' 00 '和 DBNY ' 00 '),并溢出得到进位' 1 '。同样地,顺序表 603 中的第四行的各个指针值及该行本身对应的组号分别被送到选择器 616 和 615 。此时,由于加法器 611 有进位' 1 ',因此选择端口' 4 '输出的组号' 00 '作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 00 '和组内块号' 00 ')、 DBNY (' 00 ')均产生完毕,且指向数据存储器 701 中数据 C 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 C 使用。按上述方法依次操作,即可在压缩比例为' 0 '的情况下,根据数据步长计算出下次数据地址对应的 DBN 。 For example, in the process of obtaining data B according to the technical solution of the present invention, the group number '11' in the data point and the block number in the group are 11 ', DBNY '11' and data step '1' are read again. The block number '11' in the group is output by the shifter 605 and forms '1111' and data step size together with DBNY. 1 'Additions get '0000' (ie the block number '00' and DBNY '00' in the group corresponding to the next data address) and overflow to get the carry '1'. Similarly, the sequence table 603 The respective pointer values of the fourth row and the corresponding group number of the row itself are sent to selectors 616 and 615, respectively. At this time, since the adder 611 has a carry '1', the port '4 is selected. 'Output group number ' 00 ' is the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '00' and group block number '00'), DBNY (' 00 ') is generated and points to data C in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for reading data C next time. Use. According to the above method, the DBN corresponding to the next data address can be calculated according to the data step size when the compression ratio is '0'.
根据本发明技术方案,当数据步长大于等于数据块长度两倍时,可以对数据地址中的索引号进行压缩。表 1 中显示了一些常用的压缩比例及相应的移位位数、屏蔽(或掩码)等情况。在表 1 中,第一列显示了数据步长的范围;第二列显示了对顺序表中存储的标签及索引号匹配时屏蔽位的情况,其中 T 表示标签, I 表示组内块号,下划线部分则表示被屏蔽的位;第三列显示了相应的移位位数;第四列显示了相应的压缩率。 According to the technical solution of the present invention, when the data step size is greater than or equal to twice the data block length, the index number in the data address can be compressed. Table 1 Some commonly used compression ratios and corresponding shift bits, masks (or masks), etc. are shown. In Table 1 The first column shows the range of the data step size; the second column shows the case where the label and the index number stored in the sequence table match the mask bit, where T is the label, I Indicates the block number within the group, the underlined part indicates the masked bit; the third column shows the corresponding shift bit number; the fourth column shows the corresponding compression ratio.
具体地,在第一行中,数据步长小于数据块长度两倍,因此不压缩,此时只屏蔽索引号,且移位位数为' 0 ',压缩率为' 1 '(即不压缩)。在第二行中,数据步长大于等于数据块长度两倍且小于数据块长度四倍,因此可以压缩,此时屏蔽标签的最低位及屏蔽索引号的高位,且移位位数为' 1 ',压缩率为' 2 '。在第三行中,数据步长大于等于数据块长度四倍且小于数据块长度八倍,因此可以压缩,此时屏蔽标签的最低两位,且移位位数为' 2 ',压缩率为' 4 '。在第四行中,数据步长大于等于数据块长度八倍且小于数据块长度十六倍,因此可以压缩,此时屏蔽标签的最低第二、第三这两位,且移位位数为' 3 ',压缩率为' 8 '。对于其他情况,也可以此类推。 Specifically, in the first row, the data step size is less than twice the data block length, so it is not compressed. At this time, only the index number is masked, and the shift bit number is '0. ', compression rate ' 1 '(ie not compressed). In the second line, the data step size is greater than or equal to twice the length of the data block and less than four times the length of the data block, so it can be compressed. At this time, the lowest bit of the mask tag and the high bit of the mask index number are masked, and the number of shift bits is '1. ', the compression ratio is '2'. In the third line, the data step size is greater than or equal to four times the length of the data block and less than eight times the length of the data block, so it can be compressed. At this time, the lowest two bits of the label are masked, and the number of shift bits is '2. ', the compression ratio is '4'. In the fourth line, the data step size is greater than or equal to eight times the length of the data block and less than sixteen times the length of the data block, so it can be compressed. At this time, the lowest second and third two bits of the label are masked, and the number of shift bits is ' 3 ', the compression ratio is '8'. For other situations, this can also be the case.
表1
数据步长 屏蔽位 移位位数 压缩率
<2 X TTTTTXX 0 1
2X < 4X TTTTXX I 1 2
4X < 8 X TTTXX I I 2 4
8X < 16 X TTXXT I I 3 8
Table 1
Data step size Shield bit Shift digit Compression ratio
<2 X TTTTTXX 0 1
2X < 4X TTTTXX I 1 2
4X < 8 X TTTXX II 2 4
8X < 16 X TTXXT II 3 8
请参考图 7B ,其为本发明所述顺序表和数据缓存的另一个实施例。图 7B 的缓存中各个组及顺序表的结构与图 7A 中的相同。然而,在本实施例中压缩比例为' 01 ',且数据步长是数据块长度的整数倍(数据步长为二进制补码形式的' 11000 ',即十进制的' -8 ')。例如,组' 00 '和组' 01 '的每个存储块对应的数据地址索引号的最低位都是' 0 ',而组' 10 '和组' 11 '的每个存储块对应的数据地址索引号的最低位都是' 1 '。此时,在根据数据地址查找对应的组时,屏蔽位( mask bit )根据压缩比例(' 1 ')被左移一位,屏蔽了数据地址中索引号的高位及标签的最低一位(如图 7B 中标签 715 和索引号 717 中的下划线所示)。即,对数据地址中标签除最低一位以外的部分及索引号的最低位进行匹配,以找到该数据地址对应的组;而被屏蔽的那两位就是该数据地址在该组中对应的组内块号。在本实施例中,顺序表 603 中组' 00 '对应的行内的标签值为' 1000 ',组' 01 '对应的行内的标签值为' 1010 ',其中被屏蔽的那一位均为' 0 ',表示这两组中存储的数据块的组边界都是对齐的。进一步地,组' 00 '和组' 01 '的标签除最低一位以外的部分连续且索引号最低位相同,即组' 00 '中存储的数据块的标签及索引号分别是' 100000 '、' 100010 '、' 100100 '和' 100110 ';组' 01 '中存储的数据块的标签及索引号分别是' 101000 '、' 101010 '、' 101100 '和' 101110 '。  Please refer to FIG. 7B, which is another embodiment of the sequence table and data cache of the present invention. Figure 7B The structure of each group and sequence table in the cache is the same as in Figure 7A. However, in this embodiment, the compression ratio is '01', and the data step size is an integer multiple of the data block length (the data step is in the form of two's complement) 11000 ', which is the decimal '-8'). For example, the lowest bit of the data address index number corresponding to each memory block of group '00' and group '01' is '0', while group '10' and group ' The lowest bit of the data address index number corresponding to each memory block of 11 ' is '1'. At this time, when searching for the corresponding group according to the data address, the mask bit is based on the compression ratio (' 1 ') is shifted one bit to the left, masking the upper digit of the index number in the data address and the lowest digit of the label (as in Figure 7B, label 715 and index number 717) The underline is shown in the middle). That is, matching the portion of the data address other than the lowest bit and the lowest bit of the index number to find the group corresponding to the data address; and the two bits that are masked are the corresponding group of the data address in the group. Inner block number. In this embodiment, the sequence table In 603, the label value in the row corresponding to the group '00' is '1000', and the label value in the row corresponding to the group '01' is '1010', and the one that is masked is '0'. ', indicating that the group boundaries of the data blocks stored in the two groups are aligned. Further, the labels of the group '00' and the group '01' are consecutive except for the lowest one and the lowest digit of the index number is the same, that is, the group '00 The label and index number of the data block stored in ' are '100000', '100010', '100100' and '100110'; group '01 The labels and index numbers of the data blocks stored in ' are '101000', '101010', '101100' and '101110', respectively.
同样地,以图 7B 中所示依次访问数据步长相同的四个数据 E 、 F 、 G 和 H 为例,其中数据 E 和 F 分别是组' 01 '的第二和第一个存储块中的第二个数据,而数据 G 和 H 分别是组' 00 '的第四和第三个存储块中的第二个数据,即这四个数据的数据地址之差就是数据步长' 11000 '。如之前实施例所述,在根据轨道表 619 中数据点内容从数据存储器 701 中获取数据 E 的过程中,该数据点中的 DBNX 、 DBNY 和数据步长均被读出,且 DBNX 的值为' 0101 ',其中组号为' 01 ',组内块号为' 01 '; DBNY 的值为' 01 '(即存储块中的第 2 个数据);数据步长的值' 11000 '。 Similarly, access the four data E, F, G, and H with the same data step size as shown in Figure 7B. As an example, where data E and F are the second data in the second and first memory blocks of group '01', respectively, and data G and H are group '00, respectively. The second data in the fourth and third memory blocks, that is, the difference between the data addresses of the four data is the data step '11000'. As described in the previous embodiment, in accordance with the track table 619 During the process of obtaining the data E from the data memory 701, the DBNX, DBNY and data step sizes in the data point are read out, and the value of DBNX is '0101. ', where the group number is '01', the block number in the group is '01'; the value of DBNY is '01' (ie the second data in the block); the value of the data step is '11000'.
根据本发明技术方案,该 DBNX 中的组内块号(' 01 ')被送到移位器 605 。该 DBNX 中的组号' 01 '被送到顺序表 603 读出对应行(即顺序表 603 中的第二行)中的内容。其中,索引号 717 中没有被屏蔽的那一位被用于移位器 605 左移时在最右侧的补位;压缩比例(' 01 ')则被送到移位器 605 和 607 作为移位位数(即移一位)。这样,移位器 605 对输入的' 01 '左移一位并补上所述补位' 0 ',得到' 010 '与 DBNY (' 01 ')一同构成' 01001 '和数据步长' 11000 '相加得到' 00001 ',其中,组内块号' 000 '经移位器 607 右移一位后输出' 00 ',即得到下次数据地址对应的组内块号(' 00 ')和 DBNY (' 01 ')。 According to the technical solution of the present invention, the intra-group block number ('01') in the DBNX is sent to the shifter 605. The DBNX The group number '01' in the middle is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the second line in the sequence table 603). Where the one of the index numbers 717 that is not masked is used for the shifter 605 The rightmost complement is shifted to the left; the compression ratio (' 01 ') is sent to shifters 605 and 607 as shift bits (ie, shifted by one). Thus, shifter 605 pairs the input '01 'Shift one bit to the left and fill the complement ' 0 ', get ' 010 ' with DBNY (' 01 ') together with ' 01001 ' and data step ' 11000 ' to get ' 00001 ', where the block number '000' in the group is shifted to the right by shifter 607 and output '00', which is the block number ('00') and DBNY (' in the group corresponding to the next data address. 01 ').
此时,由于加法器 611 没有溢出(即相减时没有借位),因此选择端口' 0 '输出的组号' 01 '作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 01 '和组内块号' 00 ')、 DBNY (' 01 ')均产生完毕,且指向数据存储器 701 中数据 F 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 F 使用。 At this time, since the adder 611 does not overflow (that is, there is no borrow when subtracting), the group number of the port '0' output is selected. 'As the group number corresponding to the next data address. At this point, DBNX corresponding to the next data address (ie group number '01 ' and group block number '00'), DBNY (' 01 ') is generated and points to data F in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for use in the next read data F.
又如,在按本发明技术方案获取数据 F 的过程中,该数据点中的组号' 01 '、组内块号' 00 '、 DBNY ' 01 '和数据步长' 11100 '再次被读出。其中组内块号' 00 '经移位器 605 左移一位输出并补位' 0 '后与 DBNY 一同构成' 00001 '和数据步长' 11000 '相加得到' 11001 '(即下次数据地址对应的组内块号是' 110 '经右移一位得到的' 11 '和 DBNY ' 00 '),并发生借位溢出。因此选择端口' -1 '输出的组号' 00 '作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 00 '和组内块号' 11 ')、 DBNY (' 01 ')均产生完毕,且指向数据存储器 701 中数据 G 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 G 使用。按上述方法依次操作,即可在压缩比例不为' 0 ',但数据步长是数据块长度整数倍的情况下,根据数据步长计算出下次数据地址对应的 DBN 。 For example, in the process of obtaining the data F according to the technical solution of the present invention, the group number '01' in the data point and the block number '00 in the group. ', DBNY '01 ' and data step '11100' are read again. The block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 0 ' and then with DBNY Together, '00001' and data step '11000' are added to get '11001' (that is, the block number corresponding to the next data address is '110' and shifted to the right by '11' and DBNY ' 00 '), and a borrow overflow occurred. Therefore, the group number '00' output by the port '-1' is selected as the group number corresponding to the next data address. At this point, the DBNX corresponding to the next data address (ie group number '00' and group block number '11'), DBNY ('01') are generated and point to data G in data memory 701. The DBN via bus 649 It is written back to the data point in the track table 619 for the next read data G. According to the above method, the compression ratio is not '0. ', but when the data step size is an integer multiple of the data block length, the DBN corresponding to the next data address is calculated according to the data step size.
请参考图 7C ,其为本发明所述顺序表和数据缓存的另一个实施例。图 7C 的缓存中各个组及顺序表的结构与图 7B 中的相同。然而,在本实施例中,且数据步长不是数据块长度的整数倍(数据步长为' 1001 ',即十进制的' 9 ')。在本实施例中,组' 00 '和组' 01 '的标签除最低一位以外的部分连续且索引号最低位相同,组' 01 '和组' 11 '的标签除最低一位以外的部分相同且索引号最低位连续。即,组' 00 '中存储的数据块的标签及索引号分别是' 100000 '、' 100010 '、' 100100 '和' 100110 ';组' 01 '中存储的数据块的标签及索引号分别是' 101000 '、' 101010 '、' 101100 '和' 101110 ';组' 11 '中存储的数据块的标签及索引号分别是' 101001 '、' 101011 '、' 101101 '和' 101111 '。 Please refer to FIG. 7C, which is another embodiment of the sequence table and data cache of the present invention. Figure 7C The structure of each group and sequence table in the cache is the same as in Figure 7B. However, in this embodiment, and the data step size is not an integer multiple of the data block length (the data step is '1001', that is, the decimal '9' '). In this embodiment, the labels of the group '00' and the group '01' are consecutive except for the lowest one and the lowest digits of the index number are the same, the group '01' and the group '11 The 'label' is identical except for the lowest one and the index number is the lowest. That is, the label and index number of the data block stored in the group '00' are '100000', '100010', ' 100100 'and '100110'; the labels and index numbers of the data blocks stored in group '01' are '101000', '101010', '101100' and ' 101110 '; The label and index number of the data block stored in the group '11' are '101001', '101011', '101101' and '101111 respectively '.
以图 7C 中所示依次访问数据步长相同的四个数据 J 、 K 、 L 和 M 为例,其中数据 J 是组' 00 '第三个数据块中的第二个数据,数据 K 是组' 00 '的第四个数据块中的第三个数据,数据 L 是组' 10 '中第一个数据块中的第四个数据,数据 M 是组' 11 '中的第二个存储块中的第一个数据,即这四个数据的数据地址之差就是数据步长' 1001 '。如之前实施例所述,在根据轨道表 619 中数据点内容从数据存储器 701 中获取数据 J 的过程中,该数据点中的 DBNX 、 DBNY 和数据步长均被读出,且 DBNX 的值为' 0010 ',其中组号为' 00 ',组内块号为' 10 '; DBNY 的值为' 01 ';数据步长的值为' 1001 '。 For example, access the four data J, K, L, and M with the same data step as shown in Figure 7C, where data J Is the second data in the third data block of group '00', data K is the third data in the fourth data block of group '00', and data L is the group '10 The fourth data in the first data block, the data M is the first data in the second memory block in the group '11', that is, the difference between the data addresses of the four data is the data step size' 1001 '. As described in the previous embodiment, in the process of obtaining data J from the data memory 701 based on the contents of the data points in the track table 619, DBNX, DBNY in the data point And the data step size is read out, and the value of DBNX is '0010', where the group number is '00', the block number in the group is '10'; the value of DBNY is '01'; the value of the data step is ' 1001 '.
根据本发明技术方案,该 DBNX 中的组内块号(' 10 ')被送到移位器 605 。该 DBNX 中的组号' 00 '被送到顺序表 603 读出对应行(即顺序表 603 中的第一行)中的内容。其中,索引号 717 中没有被屏蔽的那一位被用于移位器 605 左移时在最右侧的补位;压缩比例(' 01 ')则被送到移位器 605 和 607 作为移位位数(即移一位)。这样,移位器 605 对输入的' 10 '左移一位并补上所述补位' 0 ',得到' 100 '与 DBNY (' 01 ')一同构成' 10001 '和数据步长' 1001 '相加得到' 11010 ',其中,组内块号' 110 '经移位器 607 右移一位后输出' 11 ',即得到下次数据地址对应的组内块号(' 11 ')和 DBNY (' 10 ')。 According to the technical solution of the present invention, the intra-group block number ('10') in the DBNX is sent to the shifter 605. The DBNX The group number '00' in the sequence is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the first line in the sequence table 603). Where the one of the index numbers 717 that is not masked is used for the shifter 605 The rightmost complement is shifted to the left; the compression ratio (' 01 ') is sent to shifters 605 and 607 as shift bits (ie, shifted by one). Thus, shifter 605 pairs the input '10 'Shift one bit to the left and fill the complement ' 0 ', get ' 100 ' and DBNY (' 01 ') together with '10001 ' and data step '1001 ' to get '11010 ', where the block number '110' in the group is shifted by one bit by the shifter 607 and then output '11', that is, the block number ('11') and DBNY ('10) corresponding to the next data address are obtained. ').
此时,由于加法器 611 没有溢出(即相加时没有进位),因此选择端口' 0 '输出的组号' 00 '作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 00 '和组内块号' 11 ')、 DBNY (' 10 ')均产生完毕,且指向数据存储器 701 中数据 K 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 K 使用。 At this time, since the adder 611 does not overflow (that is, there is no carry when adding), the group number '00 of the port '0' output is selected. 'As the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '00' and group block number '11'), DBNY (' 10 ') is generated and points to the data K in the data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for use in reading data K next time.
又如,在按本发明技术方案获取数据 K 的过程中,该数据点中的组号' 00 '、组内块号' 11 '、 DBNY ' 10 '和数据步长' 1001 '再次被读出。其中组内块号' 11 '经移位器 605 左移一位输出并补位' 0 '后与 DBNY 一同构成' 11010 '和数据步长' 1001 '相加得到' 00011 '(即下次数据地址对应的组内块号是' 000 '经右移一位得到的' 00 '和 DBNY ' 11 '),并发生进位溢出。因此选择端口' 4 '输出的组号' 01 '作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 01 '和组内块号' 00 ')、 DBNY (' 11 ')均产生完毕,且指向数据存储器 701 中数据 L 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 L 使用。 For example, in the process of acquiring the data K according to the technical solution of the present invention, the group number '00' in the data point, the block number in the group '11 ', DBNY '10' and data step '1001' are read again. The block number '11' in the group is shifted to the left by one shifter 605 and complemented by '0' and then with DBNY. Together, the combination of '11010' and the data step '1001' yields '00011' (ie, the block number in the group corresponding to the next data address is '000' and is shifted to the right by '00' and DBNY ' 11 '), and a carry overflow occurs. Therefore, select the group number '01' output by port '4' as the group number corresponding to the next data address. At this point, the DBNX corresponding to the next data address (ie group number '01' and group block number '00'), DBNY ('11') are generated and point to data L in data memory 701. The DBN via bus 649 It is written back to the data point in the track table 619 for the next read data L.
再如,在按本发明技术方案获取数据 L 的过程中,该数据点中的组号' 01 '、组内块号' 00 '、 DBNY ' 11 '和数据步长' 1001 '再次被读出。其中组内块号' 00 '经移位器 605 左移一位输出并补位' 0 '后与 DBNY 一同构成' 00011 '和数据步长' 1001 '相加得到' 01100 '(即下次数据地址对应的组内块号是' 011 '经右移一位得到的' 01 '和 DBNY ' 00 ')。在此,虽然没有发生进位溢出,但是由于移位器 607 的右侧被移出部分 631 的值为' 1 ',与索引号中的补位' 0 '不一致,因此根据该移出部分 631 对选择器 616 进行选择。即,选择端口' 1 '输出的组号' 11 '并经选择器 615 选择后作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 11 '和组内块号' 01 ')、 DBNY (' 00 ')均产生完毕,且指向数据存储器 701 中数据 M 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 M 使用。按上述方法依次操作,即可在压缩比例不为' 0 ',且数据步长不是数据块长度整数倍的情况下,根据数据步长计算出下次数据地址对应的 DBN 。 For example, in the process of obtaining the data L according to the technical solution of the present invention, the group number '01' in the data point and the block number '00 in the group. ', DBNY '11' and data step '1001' are read again. The block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 0 ' and then with DBNY Together, '00011' and the data step '1001' are added to get '01100' (that is, the block number corresponding to the next data address is '011' and the right one is shifted to '01' and DBNY ' 00 '). Here, although the carry overflow does not occur, since the value of the shifted portion 631 on the right side of the shifter 607 is '1', and the complement '0 in the index number 'Inconsistent, so the selector 616 is selected based on the removed portion 631. That is, the group number '11' output by the port '1' is selected and passed through the selector 615. After selection, it is the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '11' and group block number '01'), DBNY (' 00 ') is generated and points to data M in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for reading data M next time. Use. According to the above method, if the compression ratio is not '0' and the data step is not an integer multiple of the data block length, the DBN corresponding to the next data address is calculated according to the data step.
进一步地,在本发明中,每个组第一个数据块对应的索引号可以不是' 0 ',以实现一种组边界不对齐的数据存储方式,灵活地存储数据,从而更好地节省存储空间。请参考图 7D ,其为本发明所述组边界不对齐的数据存储方式的一个实施例。图 7D 的缓存中各个组及顺序表的结构与图 7A 中的相同。然而,在本实施例中压缩比例为' 10 ',且数据步长不是数据块长度的整数倍(数据步长为' 10001 ',即十进制的' 17 ')。例如,组' 00 '和组' 01 '的每个存储块对应的数据地址索引号的最低位都是' 00 ',而组' 10 '和组' 11 '的每个存储块对应的数据地址索引号的最低位都是' 01 '。此时,在根据数据地址查找对应的组时,屏蔽位( mask bit )根据压缩比例(' 10 ')被左移两位,屏蔽了数据地址中标签的最低两位,而不屏蔽数据地址中索引号(如图 7D 中标签 715 中的下划线所示)。即,对数据地址中标签除最低两位以外的部分及索引号进行匹配,以找到该数据地址对应的组;而被屏蔽的那两位就是该数据地址在该组中对应的组内块号。在本实施例中,顺序表 603 中组' 00 '对应的行内的标签值为' 1000 ',组' 01 '对应的行内的标签值为' 1100 ',其中被屏蔽的那两位均为' 00 ',表示这两组中存储的数据块的组边界都是对齐的;而组' 11 '对应的行内的标签值为' 1101 ',其中被屏蔽的那两位均为' 01 ',表示该组中存储的数据块的组边界不对齐,且组边界偏移量就是' 01 '。进一步地,组' 00 '和组' 01 '的标签除最低两位以外的部分连续且索引号相同,组' 01 '和组' 11 '的标签除最低两位以外的部分相同且索引号连续。即,组' 00 '中存储的数据块的标签及索引号分别是' 0100000 '、' 0100100 '、' 0101000 '和' 0101100 ';组' 01 '中存储的数据块的标签及索引号分别是' 0110000 '、' 0110100 '、' 0111000 '和' 0111100 ';对于组' 11 ',则由于组边界不对齐,且偏移量为' 01 ',因此其中存储的数据块的标签及索引号分别是' 0110101 '、' 0111001 '、' 0111101 '和' 1110001 '。 Further, in the present invention, the index number corresponding to the first data block of each group may not be '0. ', in order to achieve a data storage method in which the group boundaries are not aligned, the data is flexibly stored, thereby saving storage space. Please refer to FIG. 7D, which is an embodiment of a data storage manner in which the group boundaries are not aligned according to the present invention. Figure The structure of each group and sequence table in the 7D cache is the same as in Figure 7A. However, in this embodiment, the compression ratio is '10', and the data step size is not an integer multiple of the data block length (the data step size is '10001') ', which is the decimal '17'). For example, the lowest bit of the data address index number corresponding to each memory block of group '00' and group '01' is '00', while group '10' and group '11 The lowest bit of the data address index number corresponding to each memory block is '01'. At this time, when searching for the corresponding group according to the data address, the mask bit is based on the compression ratio (' 10 ') is shifted to the left by two, masking the lowest two digits of the label in the data address, and does not mask the index number in the data address (as in Figure 7D, label 715) The underline is shown in the middle). That is, the part of the data address except the lowest two bits and the index number are matched to find the group corresponding to the data address; and the two bits that are masked are the block numbers of the corresponding group in the group. . In this embodiment, the sequence table In 603, the label value in the row corresponding to group '00' is '1000', and the label value in the row corresponding to group '01' is '1100', and the two blocks that are masked are '00'. ', indicating that the group boundaries of the data blocks stored in the two groups are aligned; and the label value in the row corresponding to the group '11' is '1101', and the two blocks that are masked are '01 ', indicating that the group boundaries of the data blocks stored in the group are not aligned, and the group boundary offset is '01'. Further, group '00' and group '01 The label of 'the last two digits is consecutive and the index number is the same, and the labels of the group '01' and the group '11' are identical except for the lowest two digits and the index numbers are consecutive. Ie, group '00 The label and index number of the data block stored in ' are '0000000', '0100100', '0101000' and '0101100'; group '01 The label and index number of the data block stored in ' are '0110000', '0110100', '0111000' and '0111100'; for group '11 ', because the group boundaries are not aligned, and the offset is '01', the labels and index numbers of the data blocks stored therein are '0110101', '0111001', '0111101 'and ' 1110001 '.
此外,由于组' 11 '的组边界不对齐,因此在根据经总线 641 送来的数据地址中标签除最低两位以外的部分及索引号匹配到该组时,该数据地址中标签的最低两位还需要和经总线 643 送来的存储在顺序表 603 中组' 11 '对应的行中存储的标签的最低两位经减法器 613 相减,以确定该数据地址对应的组内块号。例如,若数据地址是' 011011011 '(即标签为' 01110 ',索引号为' 01 ',块内偏移量为' 11 '),则根据标签除最低两位以外的部分(' 011 ')和索引号' 01 '可以匹配到组' 11 '。标签最低两位(' 10 ')则与顺序表 603 中组' 11 '对应的行中存储的标签的最低两位(' 01 ')经减法器 613 相减,得到' 01 '(第二个数据块),即该数据地址对应组' 11 '第二个数据块的最后一个数据。 In addition, since the group boundaries of the group '11' are not aligned, they are based on the bus 641. When the part other than the lowest two digits and the index number of the transmitted data address match the group, the lowest two digits of the label in the data address also need to be stored in the sequence table 603 by the bus 643. The lowest two bits of the tag stored in the corresponding row are subtracted by the subtractor 613 to determine the block number within the group corresponding to the data address. For example, if the data address is '011011011' (ie the label is '01110' ', the index number is '01', the offset within the block is '11'), according to the label except the lowest two digits (' 011 ') and the index number ' 01 ' can be matched to the group ' 11 '. The lowest two digits of the label (' 10 ') are subtracted from the lowest two digits (' 01 ') of the label stored in the row corresponding to the group '11' in the sequence table 603 by the subtractor 613, resulting in '01 '(Second data block), ie the data address corresponds to the last data of the second data block of group '11'.
以图 7D 中所示依次访问数据步长相同的四个数据 P 、 Q 、 R 和 S 为例,其中数据 P 是组' 00 '第三个数据块中的第二个数据,数据 Q 是组' 00 '的第四个数据块中的第三个数据,数据 R 是组' 10 '中第一个数据块中的第四个数据,数据 S 是组' 11 '中的第一个存储块中的第一个数据,即这四个数据的数据地址之差就是数据步长' 10001 '。如之前实施例所述,在根据轨道表 619 中数据点内容从数据存储器 701 中获取数据 P 的过程中,该数据点中的 DBNX 、 DBNY 和数据步长均被读出,且 DBNX 的值为' 0010 ',其中组号为' 00 ',组内块号为' 10 '; DBNY 的值为' 01 ';数据步长的值为' 10001 '。 For example, access the four data P, Q, R, and S with the same data step as shown in Figure 7D, where data P Is the second data in the third data block of group '00', data Q is the third data in the fourth data block of group '00', and data R is the group '10 The fourth data in the first data block, the data S is the first data in the first memory block in the group '11', that is, the difference between the data addresses of the four data is the data step size' 10001 '. As described in the previous embodiment, in the process of obtaining data P from the data memory 701 based on the contents of the data points in the track table 619, DBNX, DBNY in the data point And the data step size is read out, and the value of DBNX is '0010', where the group number is '00', the block number in the group is '10'; the value of DBNY is '01'; the value of the data step is ' 10001 '.
根据本发明技术方案,该 DBNX 中的组内块号(' 10 ')被送到移位器 605 。该 DBNX 中的组号' 00 '被送到顺序表 603 读出对应行(即顺序表 603 中的第一行)中的内容。其中,索引号 717 中没有被屏蔽的那两位被用于移位器 605 左移时在最右侧的补位;压缩比例(' 10 ')则被送到移位器 605 和 607 作为移位位数(即移两位)。这样,移位器 605 对输入的' 10 '左移两位并补上所述补位' 00 ',得到' 1000 '与 DBNY (' 01 ')一同构成' 100001 '和数据步长' 10001 '相加得到' 110010 ',其中,组内块号' 1100 '经移位器 607 右移两位后输出' 11 ',即得到下次数据地址对应的组内块号(' 11 ')和 DBNY (' 10 ')。 According to the technical solution of the present invention, the intra-group block number ('10') in the DBNX is sent to the shifter 605. The DBNX The group number '00' in the sequence is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the first line in the sequence table 603). The two bits that are not masked in index number 717 are used for the shifter 605. The rightmost complement is shifted to the left; the compression ratio ('10') is sent to shifters 605 and 607 as shift bits (ie, shifted by two). Thus, shifter 605 pairs the input '10 'Shift left by two and fill the complement '00', get '1000' and DBNY ('01 ') together with '100001' and data step '10001' to get ' 110010 ', where the block number '1100' in the group is shifted by two bits and then output '11' by the shifter 607, that is, the block number ('11') corresponding to the next data address and the DBNY are obtained. (' 10 ').
此时,由于加法器 611 没有溢出(即相加时没有进位),因此选择端口' 0 '输出的组号' 00 '作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 00 '和组内块号' 11 ')、 DBNY (' 10 ')均产生完毕,且指向数据存储器 701 中数据 Q 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 Q 使用。 At this time, since the adder 611 does not overflow (that is, there is no carry when adding), the group number '00 of the port '0' output is selected. 'As the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '00' and group block number '11'), DBNY (' 10 ') is generated and points to data Q in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for use in reading data Q next time.
又如,在按本发明技术方案获取数据 Q 的过程中,该数据点中的组号' 00 '、组内块号' 11 '、 DBNY ' 10 '和数据步长' 10001 '再次被读出。其中组内块号' 11 '经移位器 605 左移两位输出并补位' 00 '后与 DBNY 一同构成' 110010 '和数据步长' 10001 '相加得到' 000011 '(即下次数据地址对应的组内块号是' 0000 '经右移两位得到的' 00 '和 DBNY ' 11 '),并发生进位溢出。因此选择端口' 4 '输出的组号' 01 '作为下次数据地址对应的组号。至此,下次数据地址对应的 DBNX (即组号' 01 '和组内块号' 00 ')、 DBNY (' 11 ')均产生完毕,且指向数据存储器 701 中数据 R 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 R 使用。 For example, in the process of obtaining the data Q according to the technical solution of the present invention, the group number '00' in the data point, the block number in the group '11 ', DBNY '10' and data step '10001' are read again. The block number '11' in the group is shifted to the left by the shifter 605 and the bit is shifted to '00' and then DBNY Together, '110010' and data step '10001' are added to get '000011' (that is, the block number corresponding to the next data address is '0000' and is shifted to the right by '00. 'and DBNY ' 11 '), and a carry overflow occurs. Therefore, select the group number '01' output by port '4' as the group number corresponding to the next data address. At this point, the DBNX corresponding to the next data address (ie group number '01' and group block number '00'), DBNY ('11') are generated and point to data R in data memory 701. The DBN via bus 649 It is written back to the data point in the track table 619 for the next read data R.
再如,在按本发明技术方案获取数据 R 的过程中,该数据点中的组号' 01 '、组内块号' 00 '、 DBNY ' 11 '和数据步长' 10001 '再次被读出。其中组内块号' 00 '经移位器 605 左移一位输出并补位' 00 '后与 DBNY 一同构成' 000011 '和数据步长' 10001 '相加得到' 010100 ',移位器 607 对加法器 611 输出的索引号' 0101 '右移两位得到' 01 '。在此,虽然没有发生进位溢出,但是由于移位器 607 对加法器 611 输出的索引号' 0101 '右移时右侧移出部分 631 为' 01 ',与索引号中的补位' 00 '不一致,因此该移出部分 631 对选择器 616 进行选择。即,选择端口' 1 '输出的组号' 11 '并经选择器 615 选择后作为下次数据地址对应的组号。再从顺序表 603 中组' 11 '对应的行中读出组边界偏移量' 01 ',并用移位器 607 移位得到的' 01 '减去该组边界偏移量' 01 ',得到真正的组内块号' 00 '。 For example, in the process of obtaining the data R according to the technical solution of the present invention, the group number '01' in the data point and the block number '00 in the group. ', DBNY '11' and data step '10001' are read again. The block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 00 ' and DBNY Together with the '000011' and the data step '10001' are added to get '010100', the shifter 607 outputs the index number '0101 to the adder 611. 'Shift right to get '01'. Here, although the carry overflow does not occur, since the shifter 607 shifts the index number '0101' output from the adder 611 to the right, the right side shifts out the portion 631 to ' 01 ', does not coincide with the complement '00' in the index number, so the removal portion 631 selects the selector 616. That is, select the port number '11' output by port '1' and pass the selector 615 is selected as the group number corresponding to the next data address. Then read the group boundary offset '01' from the row corresponding to the group '11' in the sequence table 603, and shift the resulting '01 with the shifter 607. 'Subtract the set of boundary offset ' 01 ' to get the true intra-group block number ' 00 '.
至此,下次数据地址对应的 DBNX (即组号' 11 '和组内块号' 00 ')、 DBNY (' 00 ')均产生完毕,且指向数据存储器 701 中数据 S 。该 DBN 经总线 649 被写回轨道表 619 内该数据点中,供下次读取数据 S 使用。按上述方法依次操作,即可在压缩比例不为' 0 '、数据步长不是数据块长度整数倍,以及组边界不对齐的情况下,根据数据步长计算出下次数据地址对应的 DBN 。 At this point, the next data address corresponds to DBNX (ie group number '11' and group block number '00'), DBNY (' 00 ') is generated and points to data S in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for reading data next time S Use. According to the above method, if the compression ratio is not '0', the data step is not an integer multiple of the data block length, and the group boundary is not aligned, the DBN corresponding to the next data address is calculated according to the data step size. .
根据本发明技术方案和构思,并结合图 7A 、 7B 、 7C 和 7D 实施例,即可采用类似方法,对其他各种不同的分组、压缩或数据步长情况做类似操作,在此不再赘述。 According to the technical solutions and concepts of the present invention, combined with Figures 7A, 7B, 7C and 7D For a similar method, a similar operation may be performed for other various packet, compression, or data step conditions, and details are not described herein again.
在本发明中,以如下方法提前将处理器核可能要装载的数据提前填充到缓存中,并提前取出供应处理器核使用。本实施例提前读取处理器核正在或将要执行的指令或指令的抽象提前处理数据读取或数据存储指令(以下以数据读取指令为例)。在第一次处理指令循环中的一条数据读取指令时,依据处理器核产生的数据地址确定并记录该指令的起始数据地址。在第二次处理指令循环中的同一条数据读取指令时,根据处理器核产生的第二次数据地址,与记录中同一数据读取指令的起始数据地址相减以获得该数据读取指令的相邻两次执行时的数据地址之差,作为数据步长记录。并以数据步长与第二次数据地址相加得到下一次数据地址并记录。并且以下一次数据地址查询高层次存储器中有无该数据。如该数据不在高层次存储器,则以下一次数据地址从低层次存储器取得相应数据填入高层次存储器。 In the present invention, data that the processor core may load is pre-filled into the cache in advance in the following manner, and the supply processor core is taken out in advance. The present embodiment advances the abstraction of the instruction or instruction that is being executed or is to be executed by the processor core in advance to process the data read or data store instruction (hereinafter, the data read instruction is taken as an example). When processing a data read instruction in the instruction loop for the first time, the start data address of the instruction is determined and recorded according to the data address generated by the processor core. When the same data read instruction in the instruction loop is processed for the second time, the second data address generated by the processor core is subtracted from the start data address of the same data read instruction in the record to obtain the data read. The difference between the data addresses of the two adjacent executions of the instruction is recorded as the data step size. The data step is added to the second data address to obtain the next data address and recorded. And the next data address queries the high-level memory for the data. If the data is not in the high-level memory, the next data address is retrieved from the low-level memory and filled into the high-level memory.
今后,每一次遇到同一条数据读取指令时,即从记录中提取与该指令相应的下一次数据地址提供给处理器核使用。同时根据需要将所述下一次数据地址与处理器核提供的准确数据地址进行比较。如无错误,并以上述下一次数据地址与数据步长相加,获得新的下一次数据地址并记录。并以新的下一次数据地址查询高层次存储器中有无该数据,如该数据不在高层次存储器,则以新的下一次数据地址从低层次存储器取得相应数据填入高层次存储器。如比较发现错误,则将发生错误时的正确地址作为起始数据地址,按上述方式重新执行。 In the future, each time the same data read command is encountered, the next data address corresponding to the instruction is extracted from the record and provided to the processor core for use. At the same time, the next data address is compared to the exact data address provided by the processor core as needed. If there is no error, and the next data address is added to the data step, the new next data address is obtained and recorded. And the new next data address is used to query whether there is any data in the high-level memory. If the data is not in the high-level memory, the corresponding data is obtained from the low-level memory with the new next data address and filled into the high-level memory. If an error is found in the comparison, the correct address at the time of the error is used as the starting data address and is re-executed as described above.
请参考图 8A ,其为本发明所述数据访问引擎的一个实施例。图 8 显示了图 6 实施例的基础上更完整的一个实施例。其中,处理器核 101 和数据存储器(或一级数据存储器) 113 与之前实施例所述相同,数据存储器 113 中的数据是低层存储器 115 中数据的子集。先入先出缓冲( FIFO ) 849 作为数据存储器 113 与处理器核 101 之间的数据缓冲器。标签 841 与数据存储器 113 共同构成传统的路组缓存。数据访问引擎 801 中的顺序表 603 、移位器 605 、 607 和 609 、加法器 611 、减法器 613 和选择器 617 与图 6 中数据访问引擎 601 中的相应功能块相同。为了便于说明,本实施例中的选择器 618 则包含了图 6 实施例中的选择器 615 和 616 。此外,顺序表 603 和选择器 618 增加了对应更多相邻组的组号的存储和选择,顺序表 603 中还增加了组有效位及索引位有效位。控制器 803 控制数据访问引擎的操作。选择器 811 、 813 、 815 在控制器 803 的控制下,选择来源于轨道表 619 或减法器 613 、 805 的组内索引号,块内偏移量,及步长以供加法器 611 ,移位器 605 、 607 计算下一个 DBN 。减法器 613 根据数据地址 641 在顺序表 603 匹配所得 643 计算索引号及块内偏移量。减法器 805 求出同一存储访问指令相邻两次访问的存储器地址之间的差,即数据步长( stride )。换算器( converter ) 807 将步长换算成压缩移位信号存入顺序表 603 ,该压缩移位信号 829 被作为移位位数用于控制移位器。当前缓存地址总线 821 将来源于轨道表 619 的表项内容送往各功能块。中间结果总线 823 将来源于减法器 613 组内块号及块内偏移量送往加法器 611 ,和移位器 605 、 607 计算下一个 DBN 。总线 825 将减法器 805 计算得到的数据步长送往选择器 815 。控制器 803 产生的控制信号 827 控制选择器 811 、 813 、 815 、 817 、 617 ,以及 819 。从顺序表 603 中输出的移位信号 829 控制移位器 605 、 607 及 609 。下一数据地址总线 881 将下一数据地址送往顺序表 603 产生相应数据地址从低层存储器 115 向数据存储器 113 预先填充数据, 也将下一 DBN 送往轨道表 619 存储。 Please refer to FIG. 8A, which is an embodiment of the data access engine of the present invention. Figure 8 shows Figure 6 A more complete embodiment based on the embodiments. The processor core 101 and the data memory (or primary data memory) 113 are the same as those described in the previous embodiment, and the data memory 113 The data in is a subset of the data in lower level memory 115. A first in first out buffer (FIFO) 849 is used as a data buffer between the data memory 113 and the processor core 101. Label 841 Together with data store 113, it forms a traditional way group cache. Sequence table 603 in data access engine 801, shifters 605, 607, and 609, adder 611, subtractor 613 and selector 617 are the same as the corresponding function blocks in data access engine 601 in FIG. For convenience of explanation, the selector 618 in this embodiment includes the selector in the embodiment of FIG. 615 and 616. In addition, the sequence table 603 and the selector 618 increase the storage and selection of the group numbers corresponding to more adjacent groups, and the group valid bits and the index bit valid bits are also added to the sequence table 603. Controller 803 Controls the operation of the data access engine. The selectors 811, 813, and 815 are selected from the track table 619 or the subtractors 613, 805 under the control of the controller 803. The intra-group index number, the intra-block offset, and the step size are used by the adder 611, and the shifters 605, 607 calculate the next DBN. Subtractor 613 according to data address 641 in the sequence table 603 Match the resulting 643 to calculate the index number and the offset within the block. The subtractor 805 finds the difference between the memory addresses of the two accesses of the same memory access instruction, that is, the data step (stride). Converter The converter 807 converts the step size into a compression shift signal stored in the sequence table 603, which is used as a shift bit number for controlling the shifter. Current cache address bus 821 The contents of the entry from the track table 619 are sent to each function block. The intermediate result bus 823 sends the block number and the intra-block offset from the subtractor 613 group to the adder 611, and the shifter 605, 607 Calculate the next DBN. Bus 825 sends the data step calculated by subtractor 805 to selector 815. Control signal generated by controller 803 827 control selector 811, 813, 815, 817, 617, and 819. The shift signal 829 output from the sequence table 603 controls the shifters 605, 607 and 609. The next data address bus 881 sends the next data address to the sequence table 603. The corresponding data address is generated from the lower memory 115 to the data memory 113, and the data is pre-populated. The DBN is sent to the track table 619 storage.
传统的缓存是一种基于匹配的间接寻址,以组相连缓存为例,数据地址中间的索引位从缓存的标签中读出复数个标签后各自与数据地址中的高位进行匹配。如果某个路组的标签匹配,则称为命中。该路组的内容就是数据地址所指向的。一级数据存储器 113 由复数个相同的存储器构成,每个存储器构成一个路组,每个路组行数相同,即采用多路组形式构成。每个存储器的每一存储行称为一级数据块,每个一级数据块有一个索引号( INDEX ) 802 ,它由一级数据块所在一级数据存储器 113 中的行号所决定。块内偏移 627 则指向块内的一个数据项。请参考图 8B ,其为本发明所述各种地址形式的示意图。根据一级数据存储器 113 中每个路组的一级数据块数及块内的数据数可将数据地址 804 划分为高位的标签 801 ,中间的索引位 802 与低位的块内偏移 627 , The traditional cache is a kind of indirect addressing based on matching. Taking the group-connected cache as an example, the index bits in the middle of the data address are read from the cached tags and then matched with the high bits in the data address. If the labels of a road group match, it is called a hit. The content of this way group is the data address pointed to. Primary data storage 113 It is composed of a plurality of identical memories, each of which constitutes a road group, and each of the road groups has the same number of rows, that is, a multi-path group. Each storage line of each memory is called a primary data block, and each primary data block has an index number ( INDEX) 802, which is determined by the line number in the primary data memory 113 where the primary data block is located. Intra-block offset 627 points to a data item within the block. Please refer to Figure 8B It is a schematic diagram of various address forms described in the present invention. The data address 804 can be divided into high-order tags according to the number of primary data blocks per block group in the primary data memory 113 and the number of data in the block 801 , the middle index bit 802 and the lower block offset 627 ,
本实施例中的缓存开始也以匹配间接寻址,建立了数据地址与缓存地址的关系后,即以缓存地址直接寻址。以缓存地址直接寻址节省了标签匹配的操作,节省了功耗,也提高了存储器访问速度。 In the embodiment, the cache start is also matched with indirect addressing, and after the relationship between the data address and the cache address is established, the cache address is directly addressed. Direct addressing with cache addresses saves tag matching operations, saves power, and increases memory access speed.
根据本发明技术方案,对于按组分配的缓存,其组存储地址 808 分为高位的存储器组地址( GN ) 623 ,中间的组内块号( index ) 625 与低位的块内偏移量( offset ) 627 。而组相联的缓存,其缓存地址 806 分为高位路组号 814 ,中间的路组内索引号 802 与低位的块内偏移量 627 。这两种缓存地址与数据地址中的块内偏移量是完全一样的,但是索引号或组内块号的位数不一定一样。因为按组分配的缓存地址的分组可以比路组分得更小,因此组存储地址的组内块号的位数可能比组相联缓存地址的索引号的位数要少,而相应标签的位数会更多。选择器 843 选择由标签 841 产生的组相联缓存地址或由数据访问引擎产生的组存储地址供轨道表 619 存储,这两种地址的形式是一样的,其实质是数据存储器 113 的地址。 According to the technical solution of the present invention, for the cache allocated by the group, the group storage address 808 is divided into the upper memory group address (GN). 623, the middle block number (index) 625 and the lower block intra-block offset (offset) 627. The associated cache, the cache address 806 is divided into the high road group number 814, the index number in the middle of the road group 802 and the low intra-block offset 627 . These two cache addresses are exactly the same as the intra-block offsets in the data address, but the index number or the number of blocks in the group is not necessarily the same. Because the grouping of buffer addresses allocated by group can be smaller than the path component, the number of bits in the group block number of the group storage address may be less than the number of bits of the index number of the group associated cache address, and the corresponding label There will be more digits. Selector 843 selects the group associative cache address generated by the tag 841 or the group storage address generated by the data access engine for the track table 619 to store. The two addresses are of the same form, and the essence is the data memory 113. the address of.
在本发明中,根据顺序表 603 的内容可以将数据地址与缓存地址相互转换。当一个数据地址要转换为缓存地址时,以该数据地址高位从总线 641 送入顺序表 603 与其中的标签和索引号匹配,匹配上的表项对应的组号可以从总线 835 读出,标签和索引号也从总线 643 读出,将总线 641 上的数据地址与 643 上的标签、索引号经减法器 613 相减,即得出标签低位与索引号及块内偏移。该标签低位与索引号经移位器 609 移位后,即得到相应缓存地址的组内块号,将上述组号、组内块号与块内偏移量组合在总线 837 上即得到与该数据地址相应的缓存地址。 In the present invention, according to the sequence table 603 The content can convert the data address and the cache address to each other. When a data address is to be converted into a cache address, the data address is sent from the bus 641 to the sequence table 603. Matching the tag and the index number, the group number corresponding to the matching entry can be read from the bus 835, the tag and the index number are also read from the bus 643, and the data address on the bus 641 is 643. The label and index number on the label are subtracted by the subtractor 613, which results in the label low bit and the index number and the intra-block offset. The label low and index number are shifted by the shifter 609 After shifting, the intra-group block number of the corresponding cache address is obtained, and the above-mentioned group number, intra-group block number and intra-block offset are combined on the bus 837 to obtain a cache address corresponding to the data address.
当一个缓存地址要转换为相应的数据地址时,即以缓存地址中的组号 623 寻址顺序表 603 ,从中读出标签和索引号经总线 643 送出。将该标签和索引号与缓存地址中的组内块号 625 、块内偏移量, 627 相加,其和就是数据地址。 When a cache address is to be converted to the corresponding data address, the sequence number table 623 is addressed in the cache address 603. The tag and index number read therefrom are sent via bus 643. The tag and index number are added to the intra-group block number 625, the intra-block offset, 627 in the cache address, and the sum is the data address.
根据本发明技术方案,数据访问引擎可以提供数据地址访问低层存储器 115 ,也可以提供缓存地址访问数据存储器 113 。数据访问引擎中还保存有数据地址和缓存地址的对应关系,可以从一种地址换算成另一种地址。 According to the technical solution of the present invention, the data access engine can provide a data address to access the lower layer memory. A cache address can also be provided to access the data store 113. The data access engine also stores the correspondence between the data address and the cache address, and can be converted from one address to another.
在本实施例中,循迹器 845 根据轨道表 619 数据点输出的内容决定下一个轨道表读取地址 851 。该轨道表读取地址 851 经延迟器 847 延迟后作为轨道表写地址 853 。 In this embodiment, the tracker 845 determines the next track table read address based on the content of the track table 619 data point output 851 . The track table read address 851 is delayed by the delay 847 and is written as the track table address 853.
请见图 8C ,其为本发明所述顺序表操作的一个实施例。其中顺序表 603 由寄存器,比较器及掩码寄存器( mask register )构成。各寄存器也可以由存储器来实现。其中,有移位区域 891 ,相邻组区域 892 ,组有效信号 893 与块有效信号 894 ,标签与索引号 895 ,掩码 896 以及比较器 897 。其中,标签与索引号及比较器可以由内容寻址存储器( CAM )实现。标签、索引号、比较器以及掩码可以由三态的 CAM ( tri-state CAM )来实现。掩码作用于标签的低位及索引号,可以选择性地使标签的低位或索引号中的某些位不参与比较(即这些标签或索引号的位不影响比较结果),实现数据压缩存放的功能。掩码由移位区域控制,当移位为' 0 '时,掩码掩盖最低位(即索引号);当移位为'1'时,掩码往左移一位,掩盖标签的最低一位及索引号的高位,留出索引号的最低一位参与比较。比较器 897 比较总线 641 上的数据地址与标签、索引号 895 经掩码 896 掩盖后的值,比较结果经总线 888 送出,供控制器 803 做决策依据。相邻组区域 892 存放与本组相邻的组号及有效位,供按数据步长步进越过本组边界时依循。组有效信号 893 在本组第一次有数据写入时设置表示本组中至少有一数据块有效,也表示本组对应于由标签、索引号段中所含的地址指向的数据。块有效信号 894 每一位代表本组中一个数据块的有效性,可以由总线 641 上输入的标签中的低位及索引号由移位区域 891 控制下移位后的结果进行解码(如 2 位二进制地址解码为 4 位,其中只有一位有效( one-hot ),每位代表一个数据块)来选择块有效信号 894 中的数据块有效信号,若该数据块有效信号为有效,则相应数据已在数据存储器 113 中该组的对应数据块中。如无效,则需将相应数据填充入该数据块中。 Please refer to FIG. 8C, which is an embodiment of the sequence table operation of the present invention. Where is the sequence table 603 It consists of a register, a comparator, and a mask register. Each register can also be implemented by a memory. Among them, there is a shift region 891, an adjacent group region 892, a group effective signal 893 with block valid signal 894, label with index number 895, mask 896, and comparator 897. Where the tag and index number and the comparator can be addressed by the content (MEM) ) Implementation. Labels, index numbers, comparators, and masks can be tri-stated CAM (tri-state CAM) )to realise. The mask acts on the lower bits of the tag and the index number, and can selectively make the lower bits of the tag or some bits in the index number not participate in the comparison (ie, the bits of the tags or index numbers do not affect the comparison result), and the data compression storage is implemented. Features. The mask is controlled by the shift region when the shift is ' When 0 ', the mask masks the lowest bit (ie index number); when the shift is '1', the mask moves one bit to the left, masking the lowest bit of the tag and the high bit of the index number, leaving the lowest index number One participated in the comparison. Comparator 897 The data address on the bus 641 is compared with the label, the index number 895 is masked by the mask 896, and the comparison result is sent via the bus 888 for the controller 803 to make a decision basis. Adjacent group area 892 Store the group number and valid digit adjacent to the group for tracking when the data step is stepped over the boundary of the group. Group valid signal 893 When the data is written for the first time in the group, it indicates that at least one data block in the group is valid, and also indicates that the group corresponds to the data pointed to by the address included in the label and the index number segment. Block valid signal 894 Each bit represents the validity of a data block in the group, and the lower bit and the index number in the tag input on the bus 641 can be decoded by the shifted result controlled by the shift region 891 (for example, the 2-bit binary address is decoded as 4 bits, of which only one is valid (one-hot), each representing a data block) to select the block valid signal 894 The data block valid signal in the data block, if the data block valid signal is valid, the corresponding data is already in the corresponding data block of the group in the data memory 113. If it is invalid, the corresponding data needs to be filled into the data block.
顺序表 603 可以由两种方式访问。一种方式是经由图 8A 中总线 641 所送入的数据地址与顺序表 603 中的标签匹配,另一种方式是经由图 8A 中的组号 831 或组号 833 直接寻址。被数据地址匹配上的或被组号寻址到的顺序表 603 中表项中的各个数据区域都可被读出或写入。例如,可以经由数据地址匹配读出相应的组号 835 ,也可经由组号 829 寻址读出相应的标签 643 。顺序表 603 中被匹配或寻址选中的表项的其他区域,如相邻组号,块有效信号及组有效信号都可以被读或写。在被写入前,表项中所有区域被重置为全' 0 '。 The sequence table 603 can be accessed in two ways. One way is via bus 641 in Figure 8A. The data address entered matches the tag in the sequence table 603, and the other mode is directly addressed via the group number 831 or group number 833 in Figure 8A. Sequence table that is matched by the data address or addressed by the group number Each data area in the table entry in 603 can be read or written. For example, the corresponding group number 835 can be read via data address matching, or the corresponding label 643 can be read out via group number 829 addressing. Sequence table Other areas of the 603 that are matched or addressed to the selected entry, such as the adjacent group number, the block valid signal and the group valid signal can be read or written. All regions in the entry are reset to all '0' before being written.
数据访问引擎遇到一条新的数据读取指令时,分几个阶段处理。第一个阶段是第一次处理一条数据读取指令,这个阶段判断数据读取指令是否在一个循环中,如不在循环中,则在组相联缓存区域的各路组中按数据读取指令的数据地址中的索引号分配一级数据存储器 113 中一个路组中的指令块,将数据写入,并在该路组该索引号对应的标签 841 中写入该数据地址中标签部分。如在循环中,则在按组分配的缓存区域中分配一个组供这条数据读取指令可能读取的数据使用。在两种情况下,都将数据地址映射为缓存地址存放在该数据读取指令的相关信息区域,同时按数据地址访问存储器提供相应数据给处理器核 101 。 When the data access engine encounters a new data read instruction, it is processed in several stages. The first stage is to process a data read instruction for the first time. This stage determines whether the data read instruction is in a loop. If it is not in the loop, the data read command is executed in each group of the group associative buffer area. The index number in the data address is assigned to the primary data storage The instruction block in a path group in 113, writes the data, and the label corresponding to the index number in the path group 841 Write the label portion of the data address. As in the loop, a group is allocated in the buffer area allocated by the group for the data that the data read instruction may read. In both cases, the data address is mapped to the cache address and stored in the relevant information area of the data read instruction, and the memory is accessed by the data address to provide corresponding data to the processor core. 101.
在本实施例中,可以根据数据读取指令是否位于一条反向的分支指令及其分支目标指令之间,来判断该数据读取指令是否位于循环中。例如,循迹器可以提供一个指针指向处理器核正在执行的当前指令之后的第一条反向的分支指令,即该分支指令的分支目标地址比该分支指令本身的地址小。这样,位于当前指令和所述分支目标指令中地址较大者之后到该分支指令之间的所有数据读取指令都位于该分支指令形成的循环中。当然,所述循迹器指针也可以指向当前指令之后的更多个反向的分支指令,并根据经过的每条反向的分支指令的分支目标地址,确定各个循环中分别包含了哪些数据读取指令。 In this embodiment, whether the data read instruction is located in the loop may be determined according to whether the data read instruction is located between a reverse branch instruction and its branch target instruction. For example, the tracker can provide a pointer to the first inverted branch instruction following the current instruction being executed by the processor core, ie, the branch target address of the branch instruction is less than the address of the branch instruction itself. Thus, all data read instructions located between the current instruction and the branch target instruction with the larger address to the branch instruction are located in the loop formed by the branch instruction. Of course, the tracker pointer can also point to more reverse branch instructions after the current instruction, and determine which data is included in each loop according to the branch target address of each inverted branch instruction. Take instructions.
第二个阶段是第二次处理同一条数据读取指令。这个阶段除按数据地址访问存储器提供相应数据给处理器核 101 外,还根据第二次的数据地址与第一次数据地址(第一次处理该数据读取指令时存放)的差算出数据步长,并以第二次的数据地址与数据步长相加,求出第三次处理该数据读取指令时存储器访问的可能数据地址,并以该可能数据地址从低层存储器 115 读数据。另外换算出与该可能数据地址对应的缓存地址,据此将来自低层存储器 115 的数据填充入数据存储器 113 。同时将该缓存地址连同步长存放在该数据读取指令的相关信息区域。 The second stage is the second processing of the same data read instruction. In this stage, in addition to accessing the memory by data address, the corresponding data is provided to the processor core. 101 In addition, the data step size is calculated according to the difference between the second data address and the first data address (stored when the data read instruction is processed for the first time), and the second data address is added to the data step size. Finding a possible data address of the memory access when processing the data read instruction for the third time, and using the possible data address from the lower layer memory 115 Read the data. Further, a buffer address corresponding to the possible data address is converted, and data from the lower layer memory 115 is filled in the data memory 113 accordingly. . At the same time, the cache address is stored in the relevant information area of the data read command.
第三个阶段是第三次或第三次以后处理同一条数据读取指令。这个阶段则直接以上一次存放的缓存地址从数据存储器 113 提供数据给处理器核 101 。数据访问引擎也有机制将处理器核 101 产生的数据地址与上一次存放的缓存地址进行比较,如不一致,则按处理器核 101 产生的数据地址重新取数据,并对缓存地址进行修正。另外再以缓存地址与数据步长相加得到下一次装载的可能缓存地址,并按此地址填充存储器 113 。然后将新的缓存地址存放在该数据读取指令的相关信息区域,以备下次使用。在此以后,对该数据读取指令的处理方式与第三次时相同。 The third stage is to process the same data read instruction after the third or third time. At this stage, data is directly supplied from the data memory 113 to the processor core directly from the cache address stored at the previous time. . The data access engine also has a mechanism to compare the data address generated by the processor core 101 with the last stored cache address, and if not, press the processor core. The generated data address retrieves the data and corrects the cache address. In addition, the cache address is added to the data step to obtain the possible cache address for the next load, and the memory is filled by this address. . The new cache address is then placed in the relevant information area of the data read command for future use. After that, the data read command is processed in the same manner as the third time.
对数据读取指令处理的不同阶段由控制器 803 控制。轨道表 619 中建立轨道时数据读取指令的缓存地址与数据步长的初始值都为 0 。控制器 803 读取数据读取指令的缓存地址与数据步长以及该数据读取指令的轨道表地址。请参看图 8D ,其为本发明所述控制器的一个实施例。控制器 803 中有复数组匹配计数器,其单元结构为以一个存储器 861 、一个比较器 862 与一个计数器 863 为一组,其中存储器 861 与比较器 862 的位宽等于轨道表地址,计数器 863 的位宽为两位。分配器 864 负责将一组空闲的匹配计数器分配给一条第一次处理的数据读取指令。初始值检测器 865 用于检测指令形式,及全' 0 '的缓存地址与数据步长。另有总线 821 将指令类型,缓存地址与数据步长导入到初始值检测器 865 ,总线 851 将轨道表地址连接到各匹配计数器组的存储器的输入与比较器的一个端口。总线 866 则将存储器(如 861 )中存储的值与总线 851 上的轨道表地址(当前数据读取指令的地址)匹配的组中计数器的计数值传输到控制逻辑 867 以按该指令所处的阶段控制数据访问引擎的操作。 The different stages of processing the data read instructions are controlled by controller 803. Track table 619 The buffer address of the data read instruction and the initial value of the data step are both 0 when the track is established. The controller 803 reads the buffer address and data step size of the data read command and the track table address of the data read command. Please refer to the picture 8D, which is an embodiment of the controller of the present invention. The controller 803 has a complex array matching counter whose unit structure is a memory 861, a comparator 862 and a counter 863. As a group, the bit width of the memory 861 and the comparator 862 is equal to the track table address, and the bit width of the counter 863 is two bits. Distributor 864 Responsible for assigning a set of idle match counters to a data read instruction processed for the first time. The initial value detector 865 is used to detect the instruction form, and the buffer address and data step size of all '0'. Another bus 821 The instruction type, cache address and data step are imported into an initial value detector 865 which connects the track table address to the input of the memory of each matching counter group and one port of the comparator. Bus 866 The count value of the counter in the group matching the value stored in the memory (such as 861) with the track table address (address of the current data read command) on the bus 851 is transferred to the control logic 867 Controls the operation of the data access engine at the stage in which the instruction is placed.
分配器 864 中有一个循环计数器,其输出经一个译码器 872 转换成指针指向各匹配计数器。被指针指向的寄存器组计数器值被读出经总线 869 送回分配器 864 ,如比较器 870 发现其值不为' 0 '(正被一条数据读取指令使用),则使分配器 864 中循环计数器计数增' 1 ',使得指针移向下一个匹配计数器组。当匹配计数器组返回的计数器值为' 0 '时,循环计数器停止计数,使指针停留在该组,下一个尚未处理的数据读取指令的轨道表地址将被存入该匹配计数器组中的存储器。以下假设指针停留在寄存器 861 所在的组。 There is a loop counter in the distributor 864 whose output is via a decoder 872 Convert to a pointer to each match counter. The register bank counter value pointed to by the pointer is read back to the distributor 864 via bus 869, as comparator 870 finds that the value is not '0'. '(being used by a data read instruction), increments the loop counter count in the allocator 864 by '1', causing the pointer to move to the next matching counter group. When the matching counter group returns a counter value of '0 When the loop counter stops counting, the pointer stays in the group, and the track table address of the next unprocessed data read command will be stored in the memory in the matching counter group. The following assumes that the pointer stays in register 861 The group in which it is located.
当初始值检测器 865 检测到非数据读取指令时,控制器 803 工作在模式 0 状态下。对该指令不作反应。当检测到一条缓存地址与数据步长都为' 0 '的数据读取指令时,即判断该指令为尚未处理过,进入第一阶段模式操作。首先,初始值检测器 865 产生一个写使能信号 868 ,将总线 851 上的数据读取指令轨道表地址存入分配器 864 的指针指向的匹配计数单元中的存储器 861 。此时,存储器 861 的值与总线 851 上的值相同,比较器 862 的输出为' 1 ',即该组为当前指令组。当前指令组相应的计数器 862 的计数增' 1 '得到' 1 ',且该计数值被放上总线 866 传输到控制逻辑 867 使其按第一阶段模式设置数据访问引擎中的各个选择器及功能块。 When the initial value detector 865 detects a non-data read command, the controller 803 operates in mode 0. In the state. No response to this directive. When a data read instruction with a buffer address and a data step size of '0' is detected, it is judged that the instruction has not been processed, and the first stage mode operation is entered. First, the initial value detector 865 generates a write enable signal 868 that stores the data read instruction track table address on bus 851 into memory 861 in the match count unit pointed to by the pointer of splitter 864. At this point, the memory The value of 861 is the same as the value on bus 851, and the output of comparator 862 is '1', ie the group is the current instruction set. The count of the corresponding counter 862 of the current instruction group is increased by '1' to get '1' ', and the count value is placed on bus 866 and transferred to control logic 867 to set each selector and function block in the data access engine in a first phase mode.
当第二次遇到这条数据读取指令时,初始值检测器 865 检测为数据读取指令,其控制各组中的比较器与总线 851 上的轨道表地址比较。寄存器 861 的值与其匹配,比较器 862 控制计数器 863 增' 1 ',使其计数值为' 2 '。匹配上的组称为当前指令组,当前指令组中计数器的值被放上总线 866 传输到控制逻辑 867 使其按第二阶段模式设置数据访问引擎中的各个选择器及功能块。 When this data read command is encountered for the second time, the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851. The value of register 861 matches it, and comparator 862 controls counter 863 to increment '1' ', make its count value '2'. The group on the match is called the current instruction group, and the value of the counter in the current instruction group is placed on the bus 866 and transferred to the control logic. It sets each selector and function block in the data access engine in the second stage mode.
当第三次遇到这条数据读取指令时,初始值检测器 865 检测为数据读取指令,其控制各组中的比较器与总线 851 上的轨道表地址比较。寄存器 861 的值与其匹配,该组为当前指令组,比较器 862 控制计数器 863 增' 1 ',使其计数值为' 3 '。该值被放上总线 866 传输到控制逻辑 867 使其按第三阶段模式设置数据访问引擎中的各个选择器及功能块。 When this data read command is encountered for the third time, the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851. The value of register 861 matches it, the group is the current instruction group, and comparator 862 controls the counter 863. Increase '1' to have a count value of '3'. This value is placed on bus 866 and transferred to control logic 867 to set each selector and function block in the data access engine in a third stage mode.
当第四次遇到这条数据读取指令时,初始值检测器 865 检测为数据读取指令,其控制各组中的比较器与总线 851 上的轨道表地址比较。寄存器 861 的值与其匹配,该组为当前指令组,比较器 862 控制计数器 863 增' 1 ',使其计数值溢出为' 0 '。该值被放上总线 866 传输到控制逻辑 867 使其按第三阶段模式设置数据引擎中的各个选择器。控制逻辑 867 对计数值' 0 '或' 3 '都按默认态第三阶段状态操作。该计数器计数到' 0 '后其计数值即不再增加,使得比较器 862 不再参与比较。计数值为' 0 '也使得该单元可被分配器 864 选择供别的数据读取指令使用。 When this data read command is encountered for the fourth time, the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851. The value of register 861 matches it, the group is the current instruction group, and comparator 862 controls the counter 863. Increase '1' to overflow its count value to '0'. This value is placed on bus 866 and transferred to control logic 867 to set each selector in the data engine in a third stage mode. Control logic 867 The count value '0' or '3' is operated in the default state of the third stage. When the counter counts to '0', its count value no longer increases, causing comparator 862 to no longer participate in the comparison. Count value is '0 'Also allows the unit to be selected by the allocator 864 for use by other data read instructions.
下一次遇到这条数据读取指令时,但初始值检测器 865 同时检测到从总线 821 上送来的相应缓存器地址与数据步长皆不为' 0 ',且各组中的比较器与总线 851 上的轨道表地址比较,结果无一相同。据此判断这是一条已进入第三阶段的数据读取指令,指导控制逻辑 867 按默认模式,即第三阶段模式控制数据引擎的运行。 The next time this data read instruction is encountered, but the initial value detector 865 detects the slave bus 821 at the same time. The corresponding buffer address and data step size sent from are not '0', and the comparator and bus in each group are 851. The track table addresses are compared and the results are the same. Based on this, it is judged that this is a data read instruction that has entered the third stage, and the control logic 867 controls the operation of the data engine in the default mode, that is, the third stage mode.
从标签 841 及顺序表 603 返回的反馈信号 888 、 889 及减法器 805 产生的差 825 都被送回控制器 803 中的控制逻辑 867 ,控制逻辑 867 根据这些反馈信号与从总线 866 传来的当前指令所处阶段信息控制数据访问引擎的运行。在某些情况下控制逻辑 867 也可将信息反馈至匹配计数器组改变当前指令所处阶段以处理异常情况。比如,一条数据读取指令已经进入第三阶段后如预测的数据地址与从处理器核经总线 641 送来的数据地址不相符,则控制逻辑 867 会送反馈信号至当前指令对应的匹配计数组,使其计数值为' 1 '。此后该指令就以第一阶段状态开始执行,经历第二,第三阶段重新建立步长及下一缓存地址存入轨道表 619 。 The difference between the feedback signals 888, 889 and subtractor 805 returned from tag 841 and sequence table 603 The 825 is sent back to the control logic 867 in the controller 803, and the control logic 867 is based on these feedback signals with the slave bus 866. The stage information of the current instruction is controlled to control the operation of the data access engine. In some cases control logic 867 Information can also be fed back to the matching counter group to change the stage of the current instruction to handle anomalies. For example, a data read instruction has entered the third stage, such as the predicted data address and the slave processor core via the bus 641. If the data address sent does not match, the control logic 867 will send a feedback signal to the matching count group corresponding to the current instruction, so that the count value is '1. '. Thereafter, the instruction begins execution in the first phase state, undergoes the second, third phase re-establishment step size and the next cache address is stored in the track table 619.
以下以一条数据读取指令在本实施例中数据访问引擎的实际运行进一步说明数据访问引擎的操作。循迹器 845 控制轨道表读地址 851 移向下一条数据读取指令,该数据读取指令对应表项中的类型 621 , DBN ( 623 、 625 、 627 ),数据步长 629 被放上当前数据地址总线 821 。控制器 803 读取类型 621 识别为数据类型, DBN 与数据步长为全' 0 ',判断该指令为尚未处理,但仍控制选择器 617 将总线 821 上全' 0 '的 DBN 经总线 861 送往数据存储器 113 获取数据放入缓冲 849 供存储器核备用(另外也可不以全' 0 '的 DBN 为地址取数据以节省功耗)。与此同时,控制器 803 进入第一阶段模式,控制选择器 817 将总线 821 上的组号 623 送入顺序表 603 ,从中选择 0 号表项(或第一个表项)中存储的标签(如该标签为无效,则输出全' 0 ')经总线 643 送往移位加法器 812 与总线 821 上的组内块号移位后相加。处理器核 101 产生的数据地址 641 与移位加法器 812 的输出在减法器 613 中相减,其差( difference )被放上总线 825 。控制器 803 从总线 825 获取该差进行分析判断,如此差不为' 0 ',控制器据此判断执行 821 上的 DBN 所指向的数据不是处理器核 101 所需的数据,即通知处理器核无视在缓冲 849 中的相应数据,等待正确数据(另外也可以不做此判断以节省功耗)。 The operation of the data access engine is further illustrated by the actual operation of the data access engine in this embodiment with a data read command. Tracker 845 Control track table read address 851 Move to the next data read instruction, the data read instruction corresponds to the type 621, DBN (623, 625, 627), data step size 629 is placed on the current data address bus 821. Controller 803 reads type 621 recognized as data type, DBN and data step size are all '0 ', judge the instruction is not processed, but still control the selector 617 to send the DBN of all '0' on the bus 821 to the data memory via the bus 861 113 to get the data into the buffer 849 For the memory core to be spared (in addition, it is not necessary to take data with the address of all '0' DBN to save power). At the same time, controller 803 enters the first stage mode and controls selector 817 to bus 821 The group number 623 on the group is sent to the sequence table 603, and the label stored in the entry No. 0 (or the first entry) is selected (if the label is invalid, the output is all '0') is sent to the shift via the bus 643. Adder 812 is added after shifting the block number in the group on bus 821. The data address 641 generated by the processor core 101 and the output of the shift adder 812 are subtracted in the subtracter 613, and the difference is Difference ) is placed on bus 825. The controller 803 obtains the difference from the bus 825 for analysis and judgment, so that the difference is not '0', and the controller judges to perform the operation on 821. The data pointed to by the DBN is not the data required by the processor core 101, that is, the processor core is notified to ignore the corresponding data in the buffer 849, and wait for the correct data (in addition, this judgment may be omitted to save power).
控制器 803 并控制总线 641 上的数据地址与顺序表 603 中及标签 841 中的标签进行匹配。如果在标签 841 中匹配命中,则按传统缓存的方式操作。如所有标签都不匹配,则将总线 641 上的数据地址经选择器 819 送往低层存储器 115 ,从低层存储器 115 读取相应数据块。此时,循迹器 845 已向前看到下一个分支点并判断出该分支是反向分支(即此处程序为一个循环),计算出其范围包含正被处理的数据读取指令,则分配一个可被替换的数据组( data group ),并指定该组中 0 号数据块(或第一个数据块)以填充从低层存储器读取的相应数据块。总线 641 上的标签与索引号部分被存入顺序表 603 中相应表项中的标签、索引号区域。该组的组有效位及相应数据块( 0 号数据块)的有效位被置为有效。该表项中的移位项此时为全' 0 ',相邻组号部分尚未有值。 Controller 803 and controls the data address on bus 641 with sequence table 603 and label 841 The tags in the match are matched. If a match is matched in tag 841, it operates as a traditional cache. If all the tags do not match, the data address on bus 641 is sent to lower level memory via selector 819. 115. The corresponding data block is read from the lower layer memory 115. At this time, the tracker 845 I have seen the next branch point forward and judged that the branch is a reverse branch (that is, the program is a loop here), and calculated that the range contains the data read instruction being processed, then assign a data that can be replaced. Group Group ) and specify the data block 0 (or the first data block) in the group to fill the corresponding data block read from the lower layer memory. The label and index number portion on the bus 641 is stored in the sequence table 603 The label and index number area in the corresponding entry. The group valid bits of this group and the valid bits of the corresponding data block (data block 0) are asserted. The shift item in this entry is now all '0 ', the adjacent group number part has not yet had a value.
该表项的地址(即组号 GN )从顺序表 603 中经总线 835 输出放上总线 837 ,同时,总线 641 上的数据地址由减法器 613 减去从顺序表 603 中经总线 643 送出的刚存入的标签,将差值放上中间结果总线 823 。因为总线 641 和总线 643 上的地址高位是相同的,其差此时就是标签低位、索引与块内偏移量。此低位经移位器 609 (此时移位量为' 0 ')也放上总线 837 与其上的组号组成一个完整的、正确的缓存地址,此时控制器 803 控制选择器 617 将总线 837 上的缓存地址放上总线 855 送往数据存储 113 指出正确的数据块供从低层存储器 115 读取的相应数据块填充。控制器 803 也控制此数据从数据存储器 113 中读出,或控制该数据直接从低层存储器 115 的输出旁路到数据缓冲 849 以备存储器 101 使用。之后控制器 803 通知处理器核 101 正确数据可供使用。 The address of the entry (i.e., the group number GN) is output from the sequence table 603 via the bus 835 to the bus 837. At the same time, the data address on the bus 641 is subtracted from the subtracted tag 613 by the just-input tag sent from the sequence table 603 via the bus 643, and the difference is placed on the intermediate result bus 823. Because of the bus The address high bits on 641 and bus 643 are the same, and the difference is now the tag low, index and intra-block offset. This low bit is also placed on the bus by the shifter 609 (the shift amount is '0' at this time). The group number on it forms a complete and correct cache address, at which point the controller 803 controls the selector 617 to place the cache address on the bus 837 on the bus 855 and send it to the data store 113 It is pointed out that the correct data block is filled with the corresponding data block read from the lower layer memory 115. The controller 803 also controls this data to be read from the data memory 113, or to control the data directly from the lower layer memory 115. The output is bypassed to data buffer 849 for use by memory 101. Controller 803 then notifies processor core 101 that the correct data is available for use.
控制器 803 也控制选择器 811 、 813 与 815 选择总线 823 上的组内块号与块内偏移与来自轨道表的全' 0 '步长在加法器 611 中如图 6 中实施例相加,将结果放上总线 881 。此时,由相加结果产生的控制线 631 控制选择器 618 选择顺序表 603 输出的当前组号也放上总线 881 。组号,组内块号与块内偏移在总线 881 上共同拼接成一个缓存器地址 DBN 。此时,控制器 803 控制选择器 843 选择总线 881 ,延迟器 847 将轨道表读地址 851 延迟后放上轨道表写地址 853 ,使得该 DBN 被写入前次读出的同一个表项,此时控制器 803 控制步长不更新(或强制写' 0 '),仍然为' 0 '。操作完成后,轨道表该表项中即存有该数据读取指令已经完成的读取的缓存地址,(以下称为 DBN1 以便于说明),步长为' 0 '。至此,数据访问引擎完成对该数据读取指令的第一阶段操作。The controller 803 also controls the selector block 811, 813 and 815 to select the intra-block number on the bus 823 and the intra-block offset and the full '0' step from the track table in the adder 611 as in the embodiment of FIG. , put the result on bus 881. At this time, the control line 631 generated by the addition result controls the current group number output by the selector 618 selection sequence table 603 to also be placed on the bus 881. The group number, the block number within the group and the intra-block offset are spliced together on the bus 881 into a buffer address DBN. At this time, the controller 803 controls the selector 843 to select the bus 881, and the delay 847 delays the track table read address 851 and puts it on the track table write address 853, so that the DBN is written to the same entry previously read. Controller 803 does not update the step size (or force write '0'), still '0'. After the operation is completed, the track table has the cache address of the read that the data read instruction has been completed in the entry, (hereinafter referred to as DBN 1 for explanation), and the step size is '0'. At this point, the data access engine completes the first phase of the data read instruction.
如前所述,此例中的程序正在执行一个循环。当再次执行到前述的同一条数据读取指令时,类型 621 , DBN1 与数据步长' 0 '被读出放到总线 821 上,轨道表地址也在总线 851 上。控制器 803 读进轨道表读地址 851 与控制器 803 中匹配计数组中存放的地址匹配,获得对该指令执行第二阶段操作的提示,由控制逻辑 867 经控制总线 827 控制数据访问引擎做相应操作。控制器 803 控制总线 821 上 DBN1 的组号( GN ) 623 从顺序表 603 中选择相应表项中存储的标签、组内块号经总线 643 送往选择器 810 与来自总线 821 上的 DBN1 中的组内块号,块内偏移由移位加法器 812 相加(其移位量从顺序表 603 中输出的总线 829 控制)。为支持顺序表 603 中标签、索引号 895 存储的非对准的标签与索引,需要在移位加法器 812 中对总线 821 上的组内块号及块内偏移量(低位)进行移位后再和总线 643 上的标签索引号(高位)相加。其和即为 DBN1 相应的数据地址,被送入减法器 805 的一个输入端。总线 641 上新来的数据地址被送到减法器 805 的另一个输入端与上述 DBN1 相应的数据地址相减,得到的差作为数据步长( stride )放上总线 825 。换算器 807 将该步长换算为相应的移位信号( shift )写入顺序表 603 中 DBN1 相应的表项中的移位信号区域。其移位量 829 从顺序表 603 被送往各移位器 605 , 607 , 609 及 812 控制移位操作。As mentioned earlier, the program in this example is executing a loop. When the same data read command as described above is executed again, the type 621, DBN 1 and the data step '0' are read out onto the bus 821, and the track table address is also on the bus 851. The controller 803 reads the track table read address 851 to match the address stored in the matching count group in the controller 803, and obtains a prompt to perform the second stage operation on the instruction, and the control logic 867 controls the data access engine via the control bus 827 to perform corresponding operating. The controller 803 controls the group number (GN) of the DBN 1 on the bus 821. 623 selects the tag stored in the corresponding entry from the sequence table 603, and the block number in the group is sent to the selector 810 via the bus 643 and the DBN 1 from the bus 821. The intra-group block number, the intra-block offset is added by the shift adder 812 (the amount of shift is controlled from the bus 829 output from the sequence table 603). To support the non-aligned labels and indexes stored in the label, index number 895 in the sequence table 603, the intra-group block number and the intra-block offset (lower bit) on the bus 821 need to be shifted in the shift adder 812. It is then added to the tag index number (high) on bus 643. The data address corresponding to DBN 1 is sent to an input of the subtractor 805. The new data address on bus 641 is sent to the other input of subtractor 805 to be subtracted from the corresponding data address of DBN 1 , and the resulting difference is placed on bus 825 as a data step (stride). The converter 807 converts the step size into a shift signal region in the corresponding entry of the DBN 1 in the sequence shift table 603 by the corresponding shift signal (shift). The shift amount 829 is sent from the sequence table 603 to the shifters 605, 607, 609 and 812 to control the shift operation.
控制器 803 控制选择器 819 选择总线 641 上的数据地址从低层存储器 115 中读取相应数据。同时 641 上数据地址与 643 上的相应 DBN1 表项中的标签、组内块号经减法器 613 相减所得的低位也被放上总线 823 。控制器 803 控制选择器 811 、 813 将总线 823 上的低位(即 DBN2 的标签低位,索引与块内偏移量)送往加法器 611 等与' 0 '相加,并按顺序表 603 中 DBN1 表项中移位域中的移位量移位。其和即为 DBN2 的组内块号与块内偏移量被放上总线 881 ,被移位器 607 右移移出的结果 631 控制选择器 618 从顺序表 603 中 DBN1 的表项中选取相邻组号。如该组号无效,则按图 7 例及第一阶段的例子分配新的组填充 DBN2 数据块,并设置其与 DBN2 相应的有效位和标签索引号等,其中移位域按 DBN1 的移位域设置。其过程中 DBN1 中原无效的组号的相邻组号会被填入新分配组的组号并置为有效,再次读出。 DBN1 的组号也会被填入 DBN2 对应的相邻组号中。如有效,则直接将该组号读出。该组号也被放上总线 881 和总线 881 上的组内块号、块内偏移量一同经总线 816 送往选择器 617 选择后作为 DBN2 的缓存地址放上总线 855 送往数据存储 113 以填充来自低层存储器 115 的数据,及从该地址读出正确的数据送往缓冲器 849 以备处理器核 101 使用。之后控制器 803 通知处理器核 101 已有正确数据可供使用。The controller 803 controls the selector 819 to select the data address on the bus 641 to read the corresponding data from the lower layer memory 115. At the same time, the lower address of the data address on the 641 and the corresponding DBN 1 entry on the 643 and the block number in the group are subtracted by the subtracter 613 are also placed on the bus 823. The controller 803 controls the selectors 811, 813 to send the lower bits on the bus 823 (i.e., the tag low of the DBN 2 , the index and the intra-block offset) to the adder 611, etc., and add it to '0', and in sequence 603 The shift amount shift in the shift domain in the DBN 1 entry. The sum of the block number and the intra-block offset of the DBN 2 is placed on the bus 881, and the result 631 is shifted out by the shifter 607. The control selector 618 selects from the entry of the DBN 1 in the sequence table 603. Adjacent group number. If the group number is invalid, assign a new group to fill the DBN 2 data block according to the example of FIG. 7 and the first stage, and set its valid bit and label index number corresponding to DBN 2 , wherein the shift domain is DBN 1 Shift domain settings. In the process, the adjacent group number of the original invalid group number in DBN 1 will be filled in the group number of the newly assigned group and set to be valid, and read again. The group number of DBN1 will also be filled in the adjacent group number corresponding to DBN2. If it is valid, the group number is read directly. The group number is also placed on the bus 881 and the group block number on the bus 881, and the offset within the block is sent to the selector via the bus 816. The 617 is selected and placed as the buffer address of the DBN 2 on the bus 855 and sent to the data storage 113. The data from the lower layer memory 115 is filled, and the correct data is read from the address and sent to the buffer 849 for use by the processor core 101. Controller 803 then notifies processor core 101 that the correct data is available for use.
控制器 803 控制选择器 811 、 813 将总线 823 上的低位(即 DBN2 的标签低位,索引与块内偏移量)送往加法器 611 等与总线 825 上的数据步长相加,并按顺序表 603 中 DBN2 表项中移位域中的移位量移位。其和即为新的组内块号与块内偏移量被放上总线 881 ,被移位器 607 右移移出的结果 631 控制选择器 618 选择顺序表 603 中的一个相邻组号。如该相邻组号无效,则说明该数据块不在一级数据存储器 113 中,此时按上例分配新的数据组。该组号与相加得到的组内块号与块内偏移量共同构成一个下一次执行该数据读取指令时的缓存地址,以下称 DBN3 。控制器 803 控制将 DBN3 与总线 825 上的数据步长经总线 881 ,选择器 843 写回轨道表 619 中同一条数据读取指令(此前存放 DBN1 )的对应表项。The controller 803 controls the selectors 811, 813 to send the lower bits on the bus 823 (i.e., the tag low of the DBN 2 , the index and the intra-block offset) to the adder 611 and the like, and add the data step size on the bus 825, and press The shift amount shift in the shift domain in the DBN 2 entry in the sequence table 603. The sum of the new intra-group block number and the intra-block offset is placed on the bus 881, and the result 631 is shifted out by the shifter 607 to control the selector 618 to select an adjacent group number in the sequence table 603. If the adjacent group number is invalid, it indicates that the data block is not in the primary data storage 113, and at this time, a new data group is allocated according to the above example. The group number and the added intra-group block number and the intra-block offset together form a cache address when the data read command is executed next time, hereinafter referred to as DBN 3 . The controller 803 controls the data step on the DBN 3 and the bus 825 via the bus 881, and the selector 843 writes back the corresponding entry of the same data read command (previously stored DBN 1 ) in the track table 619.
控制器按 DBN3 相应的数据地址从低层存储器 115 取数据填充 DBN3 指向的一级数据存储器 113 中数据块以备下个循环同一指令读取。具体地, DBN3 中的组号,组内块号及块内偏移量经总线 816 送到选择器 617 选择后指向一级数据存储器 113 中数据。同时,在顺序表 603 中根据 DBN3 中的组号读出对应的标签索引号(高位)由总线 643 输出, DBN3 中的组内块号和块内偏移量则由总线 818 送往选择器 810 被选择后由移位加法器 812 中对所述组内块号及块内偏移量进行移位后再和总线 643 上的标签索引号相加以得到正确的数据地址。该数据地址经选择器 819 选择后送到低层存储器 115 中取数据填充 DBN3 指向的一级数据存储器 113 中数据块。DBN by the controller 3 corresponding to the address data memory 115 from the lower filled DBN 113 fetch the data block pointed to a data memory 3 to prepare for the next read cycle the same instruction. Specifically, the group number, the intra-group block number, and the intra-block offset in the DBN 3 are sent to the selector 617 via the bus 816 to be selected and directed to the data in the primary data memory 113. Meanwhile, in the sequence table 603 based on the read group number of the corresponding tag 3 DBN index number (high) output by the bus 643, DBN in the block number 3 and the block offset sent by bus 818 to select After the processor 810 is selected, the intra-group block number and the intra-block offset are shifted by the shift adder 812 and then added to the tag index number on the bus 643 to obtain the correct data address. The data address is selected by the selector 819 and sent to the lower layer memory 115 to fetch the data block in the primary data memory 113 pointed to by the DBN 3 .
在下一个循环中,当 DBN3 被放上总线 821 时,控制器 803 根据轨道表地址匹配判定相应的数据读取指令已进入第三阶段。控制器 803 控制选择器 617 选取总线 821 上的 DBN3 通过总线 855 从一级数据存储器 113 读取相应数据,放入缓冲器 849 被处理器核 101 使用。控制器 803 也控制 DBN3 的相应数据地址与经 641 送来由处理器核 101 产生的数据地址比较,将 DBN3 与数据步长相加以获得 DBN4 ,并根据 DBN4 查询顺序表 603 ,如必要时从低层存储器 115 获取相应数据存入一级数据存储器 113 ,一如前例,以待下个循环。之后的循环都如此执行。In the next cycle, when DBN 3 is placed on bus 821, controller 803 determines that the corresponding data read command has entered the third stage based on the track table address match. The controller 803 controls the selector 617 to select the DBN 3 on the bus 821 to read the corresponding data from the primary data memory 113 via the bus 855, and the buffer 849 is used by the processor core 101. The controller 803 also controls the corresponding data address of the DBN 3 to be compared with the data address sent by the processor core 101 via 641, adds the DBN 3 to the data step size to obtain the DBN 4 , and queries the sequence table 603 according to the DBN 4 , if necessary. The corresponding data is retrieved from the lower layer memory 115 and stored in the primary data memory 113, as in the previous example, to be in the next cycle. The subsequent loops are executed as such.
此外,在某些循环中数据读取指令的数据步长为负数,即从某个数据地址开始读取数据地址较大的数据,之后逐次读取数据地址较前一次小的数据。这种情况下,控制器在第一阶段无法判断步长是正是负,将 DBN1 相应的数据安排在某一组中的 0 号数据块上。在第二阶段有了 DBN2 与 DBN1 相减得出数据步长,发现数据步长是负数,此时可以将 DBN2 安排在另外一个数据组的最高数据块,将 DBN2 的相应数据地址的高位(来自总线 641 )写入该组的标签索引位,并将该组的组号写入 DBN1 所在组中的相邻组号中的前一组内,将 DBN1 的组号写入该组的相邻组号中的下一组位内。如此安排符合本实施例的寻址规则,不管以数据地址或缓存地址寻址,都可以正确找到需要的数据。In addition, in some loops, the data read instruction has a negative data step size, that is, data with a larger data address is read from a certain data address, and then the data address is read one by one smaller than the previous one. In this case, the controller cannot determine in the first stage that the step size is negative, and the corresponding data of DBN 1 is arranged on the data block No. 0 in a certain group. In the second stage, DBN 2 and DBN 1 are subtracted to obtain the data step size, and the data step size is found to be negative. In this case, DBN 2 can be arranged in the highest data block of another data group, and the corresponding data address of DBN 2 is set . the high (from bus 641) bits of the set index of the tag is written, and writes the group number of the group where the group DBN adjacent group number 1 in the previous group, the group number is written DBN 1 Within the next set of bits in the adjacent group number of the group. Arranging the addressing rules in accordance with this embodiment in this way, the required data can be correctly found regardless of the address addressed by the data address or the cache address.
另一种方法则可以不分配一个新的组,而直接将 DBN2 存入 DBN1 所在的组以更节省缓存空间。其方法是将该组的组内块号倒置,以一组内有四个数据块为例。此时,即将原 DBN1 所存放的 0 号块映射为 3 号块,原 3 号块映射为 0 号块,原 1 号块映射为 2 号块,原 2 号块映射为 1 号块。其实现方式是在组内块号所经的路线上加反向器,反向器输出的组内块号是反向器输入的组内块号按位求反。为此在顺序表 603 的特征项下增设一个倒置( R )位。当 R 位为' 0 '时,反向器不起作用,输出与输入同。当 R 位为' 1 '时,反向器作用,其输出是输入的按位求反。如此,原来按减序存入组内的数据被以增序存入组内。如 DBN1 (按索引应该是 0 号)现在实际被存入 0 号块,但存入轨道表的缓存地址标为 3 号块; DBN2 (按索引应该是 -1 号)现在实际被存入 1 号块,但存入轨道表的缓存地址标为 2 号块; DBN3 (按索引应该是 -2 号)现在实际被存入 2 号块,但存入轨道表的缓存地址标为 1 号块; DBN4 (按索引应该是 -3 号)现在实际被存入 3 号块,但存入轨道表的缓存地址标为 0 号块。但这样还有一个问题,该组的标签索引位是以 DBN1 被放置在 0 号块而设定的。所以在第二阶段,求出步长为负数,填充 DBN2 数据块时就将该组的 R 位设为' 1 ',并将该组在第一阶段写入的标签、索引号域经总线 643 读出,减去一个常数后重新写回该标签、索引号域。该常数可以经查表或计算获得。设一个数据组有 n 个数据块,而待调整的顺序表 603 表项中的移位域为 s (与标签、索引号同时读出并被送到总线 829 上),则该常数等于( n-1 ) * ( s+1 )。例如,上例中 4 个数据块,移位值为' 0 ',则该常数等于' 3 '。则 DBN1 的标签、索引号值(此时对应于映射后地址上处于 3 号块的 DBN1 )减去 3 ,恰恰就是 DBN4 (此时对应于映射后地址上处于 0 号块的 DBN4 )的标签索引号值。又如,移位值为' 1 ',此时常数为' 6 '。其他由此类推,不再赘述。Another way is to not allocate a new group, but directly store DBN 2 in the group where DBN 1 is located to save more cache space. The method is to invert the block number of the group in the group, taking four data blocks in a group as an example. At this time, the block No. 0 stored in the original DBN 1 is mapped to block No. 3, and the original block No. 3 is mapped to block No. 0, the original block No. 1 is mapped to block No. 2, and the original block No. 2 is mapped to block No. 1. The implementation manner is that an inverter is added to the route of the block number in the group, and the block number of the group output by the inverter is the block number of the group input by the inverter. To this end, an inverted (R) bit is added under the feature of the sequence table 603. When the R bit is '0', the inverter does not work and the output is the same as the input. When the R bit is '1', the inverter acts and its output is the bitwise negation of the input. In this way, the data originally stored in the group in descending order is stored in the group in ascending order. For example, DBN 1 (by index should be 0) is now actually stored in block 0, but the cache address stored in the track table is marked as block 3; DBN 2 (by index should be -1) is now actually stored Block 1, but the cache address stored in the track table is marked as block 2; DBN 3 (by index should be -2) is now actually stored in block 2, but the cache address stored in the track table is marked as number 1 Block; DBN 4 (by index should be -3) is now actually stored in block 3, but the cache address stored in the track table is marked as block 0. However, there is still a problem in that the tag index bit of the group is set by DBN 1 being placed in block 0. Therefore, in the second stage, the step size is determined to be negative. When the DBN 2 data block is filled, the R bit of the group is set to '1', and the label, index number field written by the group in the first stage is passed through the bus. 643 Read, subtract a constant and then write back the label, index number field. This constant can be obtained by looking up a table or calculation. Let a data group have n data blocks, and the shift field in the order table 603 to be adjusted is s (read at the same time as the label and index number and sent to the bus 829), then the constant is equal to (n) -1 ) * ( s+1 ). For example, in the above example, 4 data blocks with a shift value of '0', the constant is equal to '3'. DBN tag is, the index value 1 (in this case corresponds to the address in the mapping block 3 DBN 1) subtracting 3, precisely DBN. 4 (a case corresponding to map the block address is 0. 4 DBN The tag index number value. As another example, the shift value is '1', and the constant is '6'. Others are deduced by analogy and will not be repeated.
数据地址和轨道表中存有的 DBN 都使用映射前的正确地址,只有将缓存地址送往一级数据存储器 113 时才需要映射后的地址,所以可以将上述反向器放置在图 8A 中选择器 617 之后且仅对组内块号进行反向。在本实施例的第三阶段,从轨道表 619 送出的 DBN 经选择器 617 送往一级数据存储器 113 中取数据时还需以组号 623 至顺序表 603 中读取 R 位以控制该反向器。如在轨道表 619 中的数据表项中增设 R 位,则可免去此时对 603 的查询。但通常此时为了与从总线 641 送来的数据地址作比较,也必须以组号 623 查询顺序表 603 从总线 643 获得与该 DBN 相应的标签索引号域。 Both the data address and the DBN stored in the track table use the correct address before the mapping, and only the cached address is sent to the primary data store. The mapped address is required only, so the above-mentioned inverter can be placed after the selector 617 in Fig. 8A and only the block numbers in the group are inverted. In the third stage of the embodiment, the DBN sent from the track table 619 When the data is fetched by the selector 617 to the primary data memory 113, the R bit is also read in the group number 623 to the sequence table 603 to control the inverter. As in the track table 619 The addition of the R bit in the data entry in the middle can eliminate the query for 603 at this time. However, in order to compare with the data address sent from the bus 641, the sequence table 603 must also be queried by the group number 623. The tag index number field corresponding to the DBN is obtained from the bus 643.
根据本发明技术方案和构思,还可以有其他任何合适的改动。对于本领域普通技术人员来说,所有这些替换、调整和改进都应属于本发明所附权利要求的保护范围。There may be any other suitable modifications in accordance with the technical solutions and concepts of the present invention. All such substitutions, modifications and improvements are intended to be within the scope of the appended claims.
工业实用性Industrial applicability
本发明提出的装置和方法可以被用于各种与数据缓存相关的应用中,可以提高处理器系统的效率。  The apparatus and method proposed by the present invention can be used in various data cache related applications, and the efficiency of the processor system can be improved.
序列表自由内容Sequence table free content

Claims (22)

  1. 一种数据缓存方法,其特征在于对缓存中的数据存储器进行配置,其中一部分存储块实现传统的组相联结构,另一部分存储块实现按组分配的结构;和A data caching method, characterized in that a data storage in a cache is configured, wherein a part of the storage blocks implements a traditional group associative structure, and another part of the storage blocks implements a structure allocated by groups;
    所述按组分配的缓存由多个组构成,每个组中存储对应同一个起始数据块地址的若干数据块,且组内各个相邻的存储块对应的数据地址之差为相同值。 The buffer allocated by the group is composed of a plurality of groups, each of which stores a plurality of data blocks corresponding to the same starting data block address, and the difference between the data addresses corresponding to the adjacent storage blocks in the group is the same value.
  2. 根据权利要求1所述方法,其特征在于,每个组中的数据块对应的数据地址具有相同部分;The method according to claim 1, wherein the data addresses corresponding to the data blocks in each group have the same portion;
    所述相同部分由数据地址中的标签构成,或由数据地址中的标签的一部分和索引号的一部分构成;和The same portion is composed of a label in a data address, or a part of a label in a data address and a part of an index number; and
    地址相邻或相近的数据块存储在同一个组中。Data blocks with adjacent or similar addresses are stored in the same group.
  3. 根据权利要求2所述方法,其特征在于,当一个组中的各个相邻的存储块对应的数据地址之差等于数据块长度时,该组中的所有存储块中的数据块地址连续;和The method according to claim 2, wherein when the difference between the data addresses corresponding to the respective adjacent storage blocks in a group is equal to the data block length, the data block addresses in all the storage blocks in the group are consecutive;
    当一个组中的各个相邻的存储块对应的数据地址之差等于数据块长度的整数倍时,该组中的所有存储块中的数据块地址的间隔相等;和When the difference between the data addresses corresponding to each adjacent storage block in a group is equal to an integer multiple of the length of the data block, the interval of the data block addresses in all the storage blocks in the group is equal;
    可以根据当前数据在该组中对应的位置以及数据步长,直接确定下一数据是否也位于该组中,以及当该下一数据位于该组中时的所在位置。It is possible to directly determine whether the next data is also located in the group based on the current location and the data step size in the group, and the location when the next data is in the group.
  4. 根据权利要求3所述方法,其特征在于,提供一个顺序表;所述顺序表的行与数据存储器中的组一一对应;且The method of claim 3, wherein a sequence table is provided; the rows of the sequence table correspond one-to-one with the groups in the data store;
    所述顺序表的每一行中包含了一个压缩比例;所述压缩比例表示了相应组中各个相邻存储块对应的数据块地址的间隔值。Each row of the sequence table includes a compression ratio; the compression ratio indicates an interval value of a data block address corresponding to each adjacent storage block in the corresponding group.
  5. 根据权利要求4所述方法,其特征在于,所述顺序表的每一行中包含了与相应组中数据块相邻的数据块所在的组的位置;和The method according to claim 4, wherein each row of said sequence table includes a location of a group in which a data block adjacent to a data block in the corresponding group is located; and
    可以根据当前数据在该组中对应的位置以及数据步长,直接确定下一数据所在的组及组中的位置。The location of the group in which the next data is located and the location in the group can be directly determined according to the corresponding location of the current data and the data step size in the group.
  6. 根据权利要求5所述方法,其特征在于,所述顺序表的每一行中包含了与相应组中第一个数据块相邻的连续若干个数据块所在的组的位置。The method of claim 5 wherein each row of said sequence table includes a location of a group of consecutive data blocks adjacent to a first data block in the respective group.
  7. 根据权利要求5所述方法,其特征在于,所述顺序表的每一行中包含了与相应组中最后一个数据块相邻的连续若干个数据块所在的组的位置。The method of claim 5 wherein each row of said sequence table includes a location of a group of consecutive data blocks adjacent to a last data block in the respective group.
  8. 根据权利要求5所述方法,其特征在于,将数据地址转换为缓存地址;The method of claim 5 wherein the data address is converted to a cache address;
    所述缓存地址由组号、组内块号和块内偏移量构成;其中块内偏移量与数据地址中的块内偏移量相同;和The cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the intra-block offset in the data address;
    所述缓存地址可以直接用于对数据缓存中的数据存储器寻址。The cache address can be used directly to address the data store in the data cache.
  9. 根据权利要求8所述方法,其特征在于,将循环代码中数据访问指令对应的数据存储在按组分配的结构中,其他数据访问指令对应的数据存储在组相联的结构中。The method according to claim 8, wherein the data corresponding to the data access instruction in the loop code is stored in the structure allocated by the group, and the data corresponding to the other data access instructions is stored in the group-associated structure.
  10. 根据权利要求9所述方法,其特征在于,对第一次执行到的数据访问指令,当其数据地址产生后被转换为缓存地址。The method of claim 9 wherein the data access instruction executed for the first time is converted to a cache address when its data address is generated.
  11. 根据权利要求10所述方法,其特征在于,对第二次执行到的数据访问指令,当其数据地址产生后被转换为缓存地址,且计算得到数据步长;所述数据步长就是两次数据地址之差;和The method according to claim 10, wherein the data access instruction executed for the second time is converted into a cache address when the data address is generated, and the data step size is calculated; the data step is twice. The difference between the data addresses; and
    根据本次缓存地址和数据步长计算出下次执行该数据访问指令时可能的下次缓存地址,供下次执行该数据访问指令是对数据存储器寻址;且Calculating, according to the current cache address and the data step, the next possible cache address when the data access instruction is executed next time, for the next execution of the data access instruction is addressing the data memory;
    当所述下次缓存地址对应的数据存储器中的数据无效时,将下次缓存地址转换为相应的数据地址,并将对应的数据填充到数据存储器中。When the data in the data memory corresponding to the next cache address is invalid, the next cache address is converted into a corresponding data address, and the corresponding data is filled into the data memory.
  12. 根据权利要求11所述方法,其特征在于,对第三次及以后执行到的数据访问指令,根据本次缓存地址和数据步长计算出下次缓存地址,供下次执行该数据访问指令是对数据存储器寻址;且The method according to claim 11, wherein the data access instruction executed for the third time and later is calculated according to the current cache address and the data step size, and the next time the data access instruction is executed. Addressing the data memory; and
    当所述下次缓存地址对应的数据存储器中的数据无效时,将下次缓存地址转换为相应的数据地址,并将对应的数据填充到数据存储器中。When the data in the data memory corresponding to the next cache address is invalid, the next cache address is converted into a corresponding data address, and the corresponding data is filled into the data memory.
  13. 一种数据缓存系统,其特征在于,所述数据缓存系统中的数据存储器可以根据配置,将其中一部分存储块作为传统的组相联结构运行,另一部分存储块作为按组分配的结构运行;和A data caching system, wherein the data storage in the data caching system can operate a part of the storage blocks as a traditional group associative structure according to a configuration, and another part of the storage blocks to operate as a group-allocated structure;
    所述按组分配的结构包含多个组,每个组包含若干个存储块及一个数据块地址存储单元,且该组中所有存储块都对应该数据块地址存储单元中的数据块地址;和The group-allocated structure includes a plurality of groups, each group comprising a plurality of memory blocks and a data block address storage unit, and all of the memory blocks in the group correspond to data block addresses in the data block address storage unit;
    每个组内各个相邻的存储块对应的数据地址之差为相同值。The difference between the data addresses corresponding to each adjacent storage block in each group is the same value.
  14. 根据权利要求13所述系统,其特征在于,还包含带掩码的比较器,所述比较器用于将数据地址中的一部分块地址与所述数据块地址存储单元中的数据块地址的相应位进行匹配,以确定该数据地址对应的数据是否存储在该组中。The system of claim 13 further comprising a masked comparator for using a portion of the block address in the data address and a corresponding bit of the data block address in the data block address storage unit A match is made to determine if the data corresponding to the data address is stored in the group.
  15. 根据权利要求14所述系统,其特征在于,当一个组中的各个相邻的存储块对应的数据地址之差等于数据块长度时,该组中的所有存储块中的数据块地址连续;且The system according to claim 14, wherein when the difference between the data addresses corresponding to the respective adjacent storage blocks in a group is equal to the data block length, the data block addresses in all the storage blocks in the group are consecutive;
    当所述数据地址对应的数据存储在该组中时,由所述被掩码的位对该组中的存储块寻址,即可找到所述数据地址对应的数据。When the data corresponding to the data address is stored in the group, the memory block in the group is addressed by the masked bit, and the data corresponding to the data address can be found.
  16. 根据权利要求14所述系统,其特征在于,还包括移位器;当一个组中的各个相邻的存储块对应的数据地址之差等于数据块长度的整数倍时,该组中的所有存储块中的数据块地址的间隔相等;且The system of claim 14 further comprising a shifter; when the difference in data addresses corresponding to respective adjacent memory blocks in a group is equal to an integer multiple of the length of the data block, all of the storage in the group The data block addresses in the block are equally spaced; and
    当所述数据地址对应的数据存储在该组中时,由所述移位器对所述被掩码的位移位后得到的值对该组中的存储块寻址,即可找到所述数据地址对应的数据。When the data corresponding to the data address is stored in the group, the value obtained by the shifter after the masked bit shift addresses the memory block in the group, and the The data corresponding to the data address.
  17. 根据权利要求14所述系统,其特征在于,还包括顺序表存储器;所述顺序表存储器中的行与数据存储器中的组一一对应;且The system of claim 14 further comprising a sequence table memory; the rows in said sequence table memory are in one-to-one correspondence with the groups in the data store;
    所述顺序表存储器的每一行中包含了一个用于存储压缩比例的存储单元;所述存储单元中存储的值表示了相应组中各个相邻存储块对应的数据块地址的间隔值。Each row of the sequence table memory includes a storage unit for storing a compression ratio; the value stored in the storage unit indicates an interval value of a data block address corresponding to each adjacent storage block in the corresponding group.
  18. 根据权利要求14所述系统,其特征在于,所述顺序表存储器的每一行中包含了指向相应组中数据块相邻的数据块所在的组的位置的指针;和The system according to claim 14, wherein each row of said sequence table memory includes a pointer to a location of a group in which a data block adjacent to a data block in the corresponding group is located; and
    可以根据当前数据在该组中对应的位置以及数据步长,直接确定下一数据所在的组及组中的位置。The location of the group in which the next data is located and the location in the group can be directly determined according to the corresponding location of the current data and the data step size in the group.
  19. 根据权利要求18所述系统,其特征在于,所述指针指向相应组中第一个数据块相邻的连续若干个数据块所在的组的位置。The system of claim 18 wherein said pointer points to a location of a group of consecutive plurality of data blocks adjacent to a first data block in the respective group.
  20. 根据权利要求18所述系统,其特征在于,所述指针指向相应组中最后一个数据块相邻的连续若干个数据块所在的组的位置。The system of claim 18 wherein said pointer points to a location of a group of consecutive plurality of data blocks adjacent to a last data block in the respective group.
  21. 根据权利要求18所述系统,其特征在于,由所述比较器对数据地址和数据块地址存储单元中的数据块地址匹配,并由移位器根据压缩比例存储单元中的值对数据地址中的索引号进行相应移位,可以将数据地址转换为缓存地址;The system according to claim 18, wherein the data address and the data block address in the data block address storage unit are matched by said comparator, and the value is in the data address according to the value in the compression ratio storage unit by the shifter. The index number is shifted accordingly, and the data address can be converted into a cache address;
    所述缓存地址由组号、组内块号和块内偏移量构成;其中块内偏移量与数据地址中的块内偏移量相同;和The cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the intra-block offset in the data address;
    所述缓存地址可以直接用于对数据缓存中的数据存储器寻址。The cache address can be used directly to address the data store in the data cache.
  22. 根据权利要求18所述系统,其特征在于,根据缓存地址对应的数据块地址存储单元中的数据块地址值,并由移位器根据压缩比例存储单元中的值对缓存地址中的组内块号进行相应移位,可以将缓存地址转换为数据地址。 The system according to claim 18, wherein the data block address value in the data block address storage unit corresponding to the cache address is stored by the shifter according to the value in the compression ratio storage unit and the intra-group block in the cache address The number is shifted accordingly to convert the cache address to a data address.
PCT/CN2014/090972 2013-11-16 2014-11-13 Data caching system and method WO2015070771A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310576787.1A CN104657285B (en) 2013-11-16 2013-11-16 Data caching system and method
CN201310576787.1 2013-11-16

Publications (1)

Publication Number Publication Date
WO2015070771A1 true WO2015070771A1 (en) 2015-05-21

Family

ID=53056780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/090972 WO2015070771A1 (en) 2013-11-16 2014-11-13 Data caching system and method

Country Status (2)

Country Link
CN (1) CN104657285B (en)
WO (1) WO2015070771A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016188392A1 (en) * 2015-05-23 2016-12-01 上海芯豪微电子有限公司 Generation system and method of data address
CN117478626A (en) * 2023-12-27 2024-01-30 天津光电聚能通信股份有限公司 Quick matching searching system, method, equipment and medium based on group connection cache

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933749B (en) * 2015-12-31 2020-10-13 北京国睿中数科技股份有限公司 Address random method and device applied to cache verification system
CN112380148B (en) * 2020-11-30 2022-10-25 海光信息技术股份有限公司 Data transmission method and data transmission device
CN112948173A (en) * 2021-02-02 2021-06-11 湖南国科微电子股份有限公司 Data recovery method, device, equipment and medium
CN113741976A (en) * 2021-08-25 2021-12-03 武汉大学 Cache bump elimination method, device, equipment and storage medium
CN113656330B (en) * 2021-10-20 2022-02-15 北京微核芯科技有限公司 Method and device for determining access address

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6157980A (en) * 1998-03-23 2000-12-05 International Business Machines Corporation Cache directory addressing scheme for variable cache sizes
CN101178690A (en) * 2007-12-03 2008-05-14 浙江大学 Design method of low-power consumption high performance high speed scratch memory
CN101876945A (en) * 2009-11-24 2010-11-03 西安奇维测控科技有限公司 Method for automatically configuring virtual block aiming at different data of logical addresses
CN102110058A (en) * 2009-12-25 2011-06-29 上海芯豪微电子有限公司 Low-deficiency rate and low-deficiency punishment caching method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070266228A1 (en) * 2006-05-10 2007-11-15 Smith Rodney W Block-based branch target address cache
GB2458295B (en) * 2008-03-12 2012-01-11 Advanced Risc Mach Ltd Cache accessing using a micro tag
JP2010097557A (en) * 2008-10-20 2010-04-30 Toshiba Corp Set associative cache apparatus and cache method
CN102662868B (en) * 2012-05-02 2015-08-19 中国科学院计算技术研究所 For the treatment of dynamic group associative cache device and the access method thereof of device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6157980A (en) * 1998-03-23 2000-12-05 International Business Machines Corporation Cache directory addressing scheme for variable cache sizes
CN101178690A (en) * 2007-12-03 2008-05-14 浙江大学 Design method of low-power consumption high performance high speed scratch memory
CN101876945A (en) * 2009-11-24 2010-11-03 西安奇维测控科技有限公司 Method for automatically configuring virtual block aiming at different data of logical addresses
CN102110058A (en) * 2009-12-25 2011-06-29 上海芯豪微电子有限公司 Low-deficiency rate and low-deficiency punishment caching method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016188392A1 (en) * 2015-05-23 2016-12-01 上海芯豪微电子有限公司 Generation system and method of data address
CN106293624A (en) * 2015-05-23 2017-01-04 上海芯豪微电子有限公司 A kind of data address produces system and method
CN117478626A (en) * 2023-12-27 2024-01-30 天津光电聚能通信股份有限公司 Quick matching searching system, method, equipment and medium based on group connection cache
CN117478626B (en) * 2023-12-27 2024-04-05 天津光电聚能通信股份有限公司 Quick matching searching system, method, equipment and medium based on group connection cache

Also Published As

Publication number Publication date
CN104657285B (en) 2020-05-05
CN104657285A (en) 2015-05-27

Similar Documents

Publication Publication Date Title
WO2015070771A1 (en) Data caching system and method
WO2011076120A1 (en) High-performance cache system and method
WO2014121737A1 (en) Instruction processing system and method
WO2014000641A1 (en) High-performance cache system and method
WO2014000624A1 (en) High-performance instruction cache system and method
WO2012175058A1 (en) High-performance cache system and method
WO2015024492A1 (en) High-performance processor system and method based on a common unit
WO2015024493A1 (en) Buffering system and method based on instruction cache
JP6796468B2 (en) Branch predictor
WO2016131428A1 (en) Multi-issue processor system and method
US5606682A (en) Data processor with branch target address cache and subroutine return address cache and method of operation
KR100333470B1 (en) Method and apparatus for reducing latency in set-associative caches using set prediction
WO2015024482A1 (en) Processor system and method using variable length instruction word
WO2015096688A1 (en) Caching system and method
KR20100032441A (en) A method and system for expanding a conditional instruction into a unconditional instruction and a select instruction
US6175897B1 (en) Synchronization of branch cache searches and allocation/modification/deletion of branch cache
WO2015103864A1 (en) Method for memory management and linux terminal
WO2015016640A1 (en) Neural network computing device, system and method
WO2015005636A1 (en) Memory system and data processing method for memory
WO2019056733A1 (en) Concurrent volume control method, application server, system and storage medium
WO2020246836A1 (en) Data management device for supporting high speed artificial neural network operation by using data caching based on data locality of artificial neural network
EP4320472A1 (en) Device and method for predicted autofocus on an object
WO2015024532A1 (en) System and method for caching high-performance instruction
WO2001042927A1 (en) Memory access device and method using address translation history table
WO2014000626A1 (en) High-performance data cache system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14862072

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14862072

Country of ref document: EP

Kind code of ref document: A1