WO2015070771A1

WO2015070771A1 - Data caching system and method

Info

Publication number: WO2015070771A1
Application number: PCT/CN2014/090972
Authority: WO
Inventors: 林正浩
Original assignee: 上海芯豪微电子有限公司
Priority date: 2013-11-16
Filing date: 2014-11-13
Publication date: 2015-05-21
Also published as: CN104657285B; CN104657285A

Abstract

A data caching system and method. When the system and the method are applied in the field of processors, before a processor executes a data reading instruction, data required by the command can be filled into a data memory, a possible data address when the command is executed next time is predicted and prefetched, and corresponding data is stored according to a rule, and the number of times of label comparison is reduced as much as possible.

Description

Data cache system and method

Technical field

The invention relates to the field of computers, communications and integrated circuits.

Background technique

In general, the role of the cache is to copy a portion of the contents of the memory, so that the content can be quickly accessed by the processor core in a short time to ensure the continuous operation of the pipeline.

The addressing of the current cache is based on the following manner, first reading the tag in the tag memory with the index segment in the address. At the same time, the contents of the read buffer are addressed by the index segment in the address and the segment within the block. In addition, the tags read in the tag memory are matched to the tag segments in the address. If the tag read from the tag memory is the same as the tag segment in the address, then the content read from the cache is valid, called a cache hit. Otherwise, if the tag read from the tag memory is different from the tag segment in the address, the cache is missing, and the content read from the cache is invalid. For the cascading cache, the above operations are performed in parallel for each way group to detect which way group cache hits. The read content corresponding to the hit path group is valid content. If all the way groups are missing, all readings are invalid. After the cache is missing, the cache control logic populates the contents of the low-level storage medium into the cache.

technical problem

In the existing cache structure, a variety of cache prefetch techniques are used to reduce the occurrence of cache misses. For instruction caching, prefetching technology can bring a certain performance boost. However, for data caching, it is difficult to effectively predict the data address due to the uncertainty of the data address. So with the growing processor / Memory speed gap, data cache loss is still the most serious bottleneck restricting the performance of modern processors.

In addition, in the most commonly used group associative cache structure in modern processors, the more the number of path groups, the better the performance of the cache, but the more tags that need to be read and compared at the same time, resulting in higher power consumption. How to reduce the number of label comparisons while increasing the way groups is one of the difficulties in data cache improvement.

Technical solution

The method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.

The invention provides a data caching method, which is characterized in that a data storage in a cache is configured, wherein a part of the storage blocks implements a traditional group associative structure, and another part of the storage blocks realizes a structure allocated by groups; The cache is composed of a plurality of groups, each of which stores a plurality of data blocks corresponding to the same start data block address, and the difference between the data addresses corresponding to the adjacent storage blocks in the group is the same value.

Optionally, the data address corresponding to the data block in each group has the same part; the same part is formed by a label in the data address, or is formed by a part of the label in the data address and a part of the index number; Or similar data blocks are stored in the same group.

Optionally, when the difference between the data addresses corresponding to each adjacent storage block in a group is equal to the data block length, the data block addresses in all the storage blocks in the group are consecutive; when each adjacent one of the groups When the difference between the data addresses corresponding to the storage block is equal to an integer multiple of the length of the data block, the interval of the data block addresses in all the storage blocks in the group is equal; the current data may be directly in the corresponding position and the data step size in the group. Determine if the next data is also in the group and where the next data is located in the group.

Optionally, a sequence table is provided; the rows of the sequence table are in one-to-one correspondence with the groups in the data storage; and each row of the sequence table includes a compression ratio; the compression ratio indicates each of the corresponding groups The interval value of the data block address corresponding to the adjacent memory block.

Optionally, each row of the sequence table includes a location of a group in which the data block adjacent to the data block in the corresponding group is located; and the current data may be directly determined according to the corresponding position and the data step size in the group. The location in which the data resides and the location in the group.

Optionally, each row of the sequence table includes a location of a group of consecutive data blocks adjacent to the first data block in the corresponding group.

Optionally, each row of the sequence table includes a location of a group of consecutive data blocks adjacent to a last data block in the corresponding group.

Optionally, the data address is converted into a cache address; the cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the offset within the block in the data address; The cache address can be used directly to address the data store in the data cache.

Optionally, the data corresponding to the data access instruction in the loop code is stored in a structure allocated by the group, and the data corresponding to the other data access instructions is stored in the group-associated structure.

Optionally, the data access instruction that is executed for the first time is converted into a cache address when its data address is generated.

Optionally, the data access instruction executed for the second time is converted into a cache address when the data address is generated, and the data step size is calculated; the data step size is the difference between the two data addresses; The cache address and the data step size calculate the next possible cache address when the data access instruction is executed next time, and the next time the data access instruction is executed, the data memory is addressed; and when the next buffer address corresponds to the data memory When the data in the data is invalid, the next cache address is converted to the corresponding data address, and the corresponding data is filled into the data memory.

Optionally, for the data access instruction executed in the third time and later, the next cache address is calculated according to the current cache address and the data step size, and the next time the data access instruction is executed, the data memory is addressed; When the data in the data memory corresponding to the next cache address is invalid, the next cache address is converted into a corresponding data address, and the corresponding data is filled into the data memory.

The present invention also provides a data caching system, wherein the data storage in the data caching system can operate a part of the storage blocks as a traditional group associative structure according to a configuration, and another part of the storage blocks are allocated as groups. The structure is operated; the structure allocated by the group comprises a plurality of groups, each group comprises a plurality of storage blocks and a data block address storage unit, and all the storage blocks in the group correspond to the data blocks in the data block address storage unit Address; the difference between the data addresses corresponding to each adjacent storage block in each group is the same value.

Optionally, the data cache system further includes a masked comparator, wherein the comparator is configured to match a part of the block address in the data address with a corresponding bit of the data block address in the data block address storage unit, It is determined whether the data corresponding to the data address is stored in the group.

Optionally, when a difference between data addresses corresponding to each adjacent storage block in a group is equal to a data block length, data block addresses in all the storage blocks in the group are consecutive; and when the data address corresponds to the data When stored in the group, the memory blocks in the group are addressed by the masked bits to find the data corresponding to the data address.

Optionally, the data cache system further includes a shifter; when a difference between data addresses corresponding to each adjacent storage block in a group is equal to an integer multiple of a data block length, in all the storage blocks in the group The intervals of the data block addresses are equal; and when the data corresponding to the data address is stored in the group, the value obtained by the shifter after the masked bit shift is the memory block in the group Addressing, the data corresponding to the data address can be found.

Optionally, the data caching system further includes a sequence table memory; the rows in the sequence table memory are in one-to-one correspondence with the groups in the data storage; and each row of the sequence table memory includes one for storing compression a storage unit of a ratio; a value stored in the storage unit represents an interval value of a data block address corresponding to each adjacent storage block in the corresponding group.

Optionally, each row of the sequence table memory includes a pointer to a location of a group in which the data block adjacent to the data block in the corresponding group is located; and the corresponding position and the data step size in the group according to the current data. Directly determine the group in which the next data is located and the location in the group.

Optionally, the pointer points to a location of a group of consecutive data blocks adjacent to the first data block in the corresponding group.

Optionally, the pointer points to a location of a group of consecutive data blocks adjacent to a last data block in the corresponding group.

Optionally, the data address and the data block address in the data block address storage unit are matched by the comparator, and the index is shifted by the shifter according to the value in the compressed ratio storage unit. The data address can be converted to a cache address; the cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the intra-block offset in the data address; The address can be used directly to address the data memory in the data cache.

Optionally, according to the data block address value in the data block address storage unit corresponding to the cache address, and the shifter performs corresponding shift on the block number in the cache address according to the value in the compression ratio storage unit, The cache address is converted to a data address.

Other aspects of the present invention can be understood and appreciated by those skilled in the art in light of the description of the invention.

Beneficial effect

The system and method of the present invention can provide a basic solution for the data cache structure used by digital systems. Unlike the traditional data caching system which only populates after the cache is missing, the system and method of the present invention fills the data cache before the processor accesses a data, and can avoid or sufficiently hide the mandatory deletion. That is to say, the cache system of the present invention integrates a prefetch process.

The system and method of the present invention also divides the data store in the data cache into a group association portion and a group assignment portion. Among them, each group in the group allocation part contains data blocks whose data addresses are adjacent or similar. Thus, data corresponding to data access instructions adjacent to or close to the data address (e.g., data access instructions in the loop code) are stored in the group-assigned portion, and other data is stored in the group associative portion. At the same time, the technical solution of the present invention converts the data address including the label, the index number and the intra-block offset into the group number, the group block number and the intra-block offset, while filling the data buffer with the data buffer. The conversion of the address space enables the data caching system to directly address the new address addressing mode without having to perform tag matching, and can directly find the corresponding data from the data memory, especially when accessing data adjacent or close to the data address. The buffer address and the data step size can be simply calculated to obtain the data address of the next data, without label matching and address conversion, which greatly reduces power consumption.

In addition, the system and method of the present invention can read out the data from the data memory and send it to the processor core for use before the processor core is about to execute the data read instruction, so that the processor core needs to read. When the data is fetched, it can be taken directly, masking the time of accessing the data memory.

Other advantages and applications of the present invention will be apparent to those skilled in the art.

DRAWINGS

1 is an embodiment of a cache system according to the present invention;

2 is a schematic diagram of a track point format according to the present invention;

Figure 3A is another embodiment of the cache system of the present invention;

3B is another schematic diagram of the track point format of the present invention;

3C is another embodiment of the cache system of the present invention;

4A is an embodiment of the improved group associative cache of the present invention;

4B is another embodiment of the improved group associative cache of the present invention;

Figure 5 is an embodiment of a data buffer of the packet of the present invention;

Figure 6 is an embodiment of the data access engine of the present invention;

Figure 7A is an embodiment of the sequence table and data cache of the present invention;

Figure 7B is another embodiment of the sequence table and data cache of the present invention;

Figure 7C is another embodiment of the sequence table and data cache of the present invention;

7D is an embodiment of a data storage manner in which the group boundaries are not aligned according to the present invention;

Figure 8A is an embodiment of the data access engine of the present invention;

Figure 8B is a schematic diagram of various address forms of the present invention;

Figure 8C is an embodiment of the sequence table operation of the present invention;

Figure 8D is an embodiment of the controller of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

Figure 6 shows a preferred embodiment of the invention.

Embodiments of the invention

The high performance cache system and method proposed by the present invention are further described in detail below with reference to the accompanying drawings and specific embodiments. Advantages and features of the present invention will be apparent from the description and appended claims. It should be noted that the drawings are in a very simplified form and all use non-precise proportions, and are only for convenience and clarity to assist the purpose of the embodiments of the present invention.

It should be noted that the various embodiments of the present invention are further described to illustrate the various embodiments of the present invention in order to clearly illustrate the present invention. Further, for the sake of brevity of explanation, the contents already mentioned in the foregoing embodiment are often omitted in the latter embodiment, and therefore, contents not mentioned in the latter embodiment can be referred to the previous embodiment accordingly.

Although the invention may be modified in various forms of modifications and substitutions, some specific embodiments of the invention are set forth in the specification and detailed. It should be understood that the inventor's point of departure is not to limit the invention to the particular embodiments set forth, but the inventor's point of departure is to protect all improvements, equivalent transformations and modifications based on the spirit or scope defined by the claims. . The same component numbers may be used in all figures to represent the same or similar parts.

Please refer to FIG. 1, which is an embodiment of a cache system according to the present invention. As shown in Figure 1, the data cache system includes a processor 101, Active Table 109, Tag Memory 127, Scanner 111, Track Table 107, Tracker 119, Instruction Memory 103, and Data Memory 113 . It should be understood that the various components listed herein are for ease of description and may include other components, and some components may be omitted. The various components herein may be distributed across multiple systems, either physically or virtually, and may be hardware implemented (eg, integrated circuits), implemented in software, or implemented in a combination of hardware and software.

In the present invention, the processor may be a processing unit including an instruction cache and a data cache, capable of executing instructions, and processing the data, including but not limited to: a general processor (General Processor ), central processing unit (CPU), microcontroller (MCU), digital signal processor (DSP), image processor (GPU), system on chip (SOC), ASIC (ASIC) )Wait.

In the present invention, the hierarchy of memory refers to the degree of proximity between the memory and the processor 101. The closer to the processor 101 The higher the level. In addition, a high level of memory (such as instruction memory 103 and data memory 113) ) Usually faster than low-level memory but small in size. 'Memory closest to the processor' refers to the memory that is closest to the processor, usually the fastest, in the storage hierarchy, such as the instruction memory 103 in this embodiment. And data memory 113. Furthermore, the memory of each level in the present invention has an inclusion relationship, that is, a memory having a lower level contains all the stored contents in a memory having a higher level.

In the present invention, a branch instruction refers to any suitable one that can cause the processor 101. The form of the instruction that changes the execution flow (eg, an instruction that is not executed in order). The branch source refers to an instruction that performs branch operations (ie, branch instruction), the branch source address can be the instruction address of the branch instruction itself, the branch target refers to the target instruction that the branch instruction is caused by the branch instruction, and the branch target address can refer to The address that is transferred when the branch transfer of the branch instruction succeeds, that is, the instruction address of the branch target instruction; the data read instruction refers to any appropriate processor that can cause the processor An instruction form for reading data from a memory; the instruction format of the data read instruction generally includes a base address register number and an address offset; and data required for the data read instruction refers to the processor 101. The data read when the data read instruction is executed; the data address of the data read instruction refers to the address used by the processor 101 to execute the data read instruction for reading/writing data; when the processor core 101 When executing a data read instruction, the data address can be calculated by using the base address plus the offset; the base address register update instruction refers to changing the value of any of the base address registers that may be used by the data read instruction. Instructions. The current instruction may refer to an instruction that is currently being executed or fetched by the processor core; the current instruction block may refer to an instruction block containing an instruction currently being executed by the processor.

In the present invention, the term 'fill' refers to prefetching corresponding instructions or required data from an external memory in advance and storing it in an instruction cache or data buffer before the processor executes an instruction.

In the present invention, the rows in the track table 107 are in one-to-one correspondence with the memory blocks in the instruction memory 103. Track table 107 Contains a plurality of track points. Here, one track point is the track table 107 An entry in the file may contain information about at least one instruction, such as the type of the instruction. When a track point contains information indicating that the track point corresponds to at least one branch instruction, the track point is a branch point, and the information may be a branch target address or the like. The tracking address of the track point is the track table address of the track point itself, and the tracking address is composed of one row address and one column address. The tracking address of the track point corresponds to the instruction address of the instruction represented by the track point, and for the branch point, the branch target instruction of each branch point containing the branch instruction represented by the branch point is in the track table. The tracking address in 107, and the tracking address corresponds to the instruction address of the branch target instruction.

In the present embodiment, the instruction memory 103 may be stored by the processor 101 in addition to the memory. In addition to the executed instructions, the instruction type information corresponding to each instruction is stored, such as whether the instruction is information of the data read instruction; the instruction type information may further indicate which type of data read instruction the corresponding instruction is. This includes information on how to calculate the data address, such as the base address register number and the location information of the address offset in the instruction code.

For ease of representation, BNX can be used to represent the row address in the branch point tracking address, ie BNX Corresponding to the location of the memory block where the instruction is located (the row number of the memory block), and the column address in the tracking address corresponds to the position (offset) of the branch instruction in its storage block. Correspondingly, each set of BNX and column address corresponds to the track table 107 A branch point in which the corresponding branch point can be found from the track table 107 based on a set of BNX and column addresses.

Further, the branch point of the track table 107 also stores a branch target instruction of the branch instruction expressed in the form of a tracking address in the instruction memory. Location information in 103. Based on the tracking address, the position of the track point corresponding to the branch target command can be found in the track table 107. That is, for the track table 107 For the branch point, the track table address is the track address corresponding to its branch source address, and the track table content contains the track address corresponding to its branch target address.

In this embodiment, the entries in the active table 109 are in one-to-one correspondence with the storage blocks in the instruction memory 103, that is, the track table can be The lines in 107 correspond one-to-one. Each entry in the active table 109 indicates where the instruction cache memory block corresponding to the active table row is stored in the instruction memory 103 and forms BNX. Correspondence with the instruction cache memory block. Each entry in the active table 109 stores the block address of an instruction cache block. Thus, when using an instruction address in the active table 109 When matching is performed, the BNX stored in the matching success item can be obtained, or the result that the matching is unsuccessful can be obtained.

Each memory block in data memory 113 is represented by a memory block number DBNX. Tag memory 127 The entries in the table are in one-to-one correspondence with the storage blocks in the data memory 113, and each entry stores the data storage 113 The block address corresponding to the storage block is formed, and the correspondence relationship between the data block address and the data cache storage block number is formed. Thus, when using a data address in the tag memory 127 When matching is performed, the storage block number stored in the matching success item can be obtained, or the result that the matching is unsuccessful can be obtained.

The scanner 111 is sent from the external memory to the instruction memory 103. The instruction is reviewed, and once an instruction is found to be a branch instruction, the branch target address of the branch instruction is calculated. For direct branch instructions, you can pass the block address of the instruction block where the instruction is located, the offset of the instruction in the instruction block, and the branch increment ( Branch Offset The three are added to get the branch target address. For indirect branch instructions, the branch target address can be obtained by adding the corresponding base address register value and branch increment. The instruction block address may be from the active list 109 Read in and sent directly to the adder in scanner 111. It is also possible to add a register for storing the current instruction block address in the scanner 111, so that the active table 109 It is not necessary to send the instruction block address in real time.

In addition, when the scanner 111 When an instruction is found to be a data read instruction, the data address corresponding to the data read instruction can also be calculated. For example, the base address register value used for the data read instruction is added to the data address offset to obtain the data address. In the present invention, data read instructions are divided into two categories: data read instructions for data address determination and data read instructions for data address uncertainties. For example, for a data read instruction that obtains a data address by summing the data read instruction itself with an address address and a data address offset (immediate number), whenever the calculated data address is correct, it can be classified as Data read command determined by the data address. For another example, for a data read instruction that obtains a data address by summing a base address register value and a data address offset (immediate number), if the base address register value has been updated when the data address is calculated, It can be classified as a data read instruction determined by the data address, otherwise it is classified as a data read instruction whose data address is undefined. According to the technical solution of the present invention, different data types can be given to the two data read instructions to be stored in the track table. 107 in the corresponding track point.

The branch target instruction address and the active table that can be calculated by the scanner 111 The storage block row address stored in the match. If the match is successful, indicating that the branch target instruction has been stored in the instruction memory 103, the active table 109 outputs the BNX to the track table 107. Fill in the corresponding entry of the branch instruction. If the match is unsuccessful, it indicates that the branch target instruction has not been stored in the instruction memory 103. At this time, the branch target instruction address is sent to the external memory, and at the active table 109. Allocating an entry to store the corresponding block address, and outputting the BNX to the track table 107 to fill in the corresponding entry of the branch instruction, and filling the corresponding instruction block sent by the external memory into the instruction memory 103 and the In the memory block corresponding to BNX, the corresponding track is also created in the corresponding row in the track table 107. For the branch instruction in the instruction block, its branch target instruction address is matched to the active table 109 to output a BNX And the position of the branch target instruction in its instruction block (ie, the intra-block offset portion of the branch target instruction address) is the corresponding track point column number, thereby obtaining the tracking address corresponding to the branch target instruction, and The tracking address is stored as a branch point content in a branch track point corresponding to the branch instruction. Also, in the scanner 111 During the review of the instruction block, the data read instruction can be found and the corresponding instruction type information is stored in the track table. Corresponding track point (ie, data point), and calculating the data address of the data read command and sending the data address to the external memory to obtain a data block including the corresponding data. At the same time, in the tag memory 127 Allocating an available table entry, filling the data block into the corresponding storage block of the data storage 113, and outputting the DBNX and the offset address of the data in the data block (ie, DBNY The content as track points is stored in the data points. In this way, the instruction block can be filled into the instruction memory 103. At the same time, a track corresponding to the entire instruction block is established. For ease of description, in this specification, the address that can directly address the data memory is called the cache address, ie the cache address (DBN) is DBNX and DBNY composition.

In the present invention, the read pointer 121 of the tracker 119 can be from the track table 107. The track point corresponding to the current instruction in the beginning starts to move until it points to the first branch point. At this time, the value of the read pointer 121 is the tracking address of the branch source instruction, which contains BNX. And the corresponding branch point column number. Based on the tracking address, the branch target instruction tracking address of the branch source instruction can be read from the track table 107. Thus, the read pointer 121 of the tracker 119 is from the track table 107. The track point corresponding to the currently executed instruction of the processor 101 starts to advance to the first branch point after the track point, and can track the address from the instruction memory 103 according to the target instruction. Find the target instruction in . During this movement, when the read pointer 121 passes the data point, the buffer address stored therein is read out and sent to the data memory 113. The corresponding data is read and pushed to the processor core. 101. Thus, the data corresponding to all data read instructions between the current instruction and the first branch point thereafter is pushed sequentially to the processor core for reading.

Please refer to FIG. 2 , which is a schematic diagram of a track point format according to the present invention. Wherein, for branch points, the format contains the instruction type 151 BNX 153 and BNY 155 corresponding to the branch target instruction. For data points, the format contains the instruction type 161, the corresponding data in the data memory 113 of DBNX 163 and DBNY 165.

Returning to Figure 1, the read pointer 121 of the tracker 119 is based on the track table 107. The position of the branch point stored therein, moves and points to the first branch point after the instruction being executed by the processor core 101, and reads the track point content from the branch point, that is, the position information of the branch target track point BNX and BNY. If the branch point corresponds to an indirect branch instruction, the corresponding branch target instruction block address needs to be read from the active table 109.

The processor core 101 outputs an instruction offset address (ie, an offset address portion in the instruction address) from the instruction memory 103 by the tracker 119 Read Pointer 121 Select the desired instruction from the pointed memory block. When the processor core executes the branch instruction, if the branch transfer does not occur (TAKEN signal 123 is '0' '), continue to output a new instruction offset address, read and execute the next instruction after the branch instruction, while the tracker 119 reads the pointer 121 Continue moving and pointing to the next branch point and repeat the above. If a branch transfer occurs (TAKEN signal 123 is '1') and the branch instruction is a direct branch instruction, processor core 101 The branch target instruction that has been prepared can be directly executed. At the same time, the value of the tracker 119 read pointer 121 is updated to the BNX and BNY, i.e., the read pointer 121. Point to the track point corresponding to the branch target instruction, start moving from the track point and point to the first branch point, if the branch transfer occurs (TAKEN signal 123 is '1 '), and the branch instruction is an indirect branch instruction, the processor core 101 outputs the block address portion in the actual target instruction address and the previous slave active table 109 The instruction block address read in the match is matched. If the match is successful, the target instruction is correct for the processor core. Direct read execution; otherwise, the actual target instruction address is sent to the external memory to acquire an instruction block containing the corresponding target instruction, and the target instruction is sent to the processor core 101 for execution. At the same time, in the active table 109 Allocating an available entry, filling the instruction block into a corresponding storage block of the instruction memory 103, and outputting the offset address of the BNX and the target instruction in the instruction block (ie, BNY) The content as the track point is stored in the branch point. At the same time, the value of the tracker 119 read pointer 121 is updated to the BNX and BNY, i.e., the read pointer 121. Point to the track point corresponding to the branch target instruction, start moving from the track point and point to the first branch point, and repeat the above operation. In this way, the next instruction and the branch target instruction can be prepared for the processor core before the processor core executes the branch instruction. 101 is chosen to avoid performance loss due to cache misses.

Similarly, when the tracker 119 reads the pointer 121 past a data point, based on the DBN stored in the data point. The corresponding data is read from the data memory 113. If the data read command is a data read command whose data address is undefined, the corresponding data block address needs to be read from the tag memory 127. Processor core When the data read instruction is executed, if the data read command is a data read command determined by the data address, the processor core 101 can directly use the data. Otherwise, the processor core 101 The block address in the output actual data address is matched with the data block address previously read from the tag memory 127. If the match is successful, the data is correct for the processor core. Used directly; otherwise, the pipeline in the processor core 101 is suspended, the actual data address is sent to the external memory to obtain a data block containing the corresponding data, and the data is sent to the processor core. After the recovery line. At the same time, an available entry is allocated in the tag memory 127, the data block is filled into the corresponding storage block of the data memory 113, and the DBNX is output. And an offset address (ie, DBNY) of the data in the data block is stored as the track point content in the data point.

Thus, at the processor core 101 Before the data read instruction is executed for the first time, the possible data corresponding to the instruction is ready. If the data is correct, the data memory 113 is completely avoided. Loss of performance due to missing, and can partially or completely mask the time required to read data memory 113. Even if the data is wrong, processor core 101 It is also possible to reacquire the correct data without increasing the waiting time.

Please refer to FIG. 3A, which is another embodiment of the cache system according to the present invention. This embodiment and Figure 1 The embodiment is similar in that a data address prediction module 301 is added and a step size bit is added to the data point format in the track table.

Please refer to FIG. 3B , which is another schematic diagram of the track point format according to the present invention. Among them, the format of the branch point still contains the instruction type 151 BNX 153 and BNY 155 corresponding to the branch target instruction. The format of the data point includes the instruction type 161, the corresponding data in the data memory 113 DBNX 163, DBNY 165 and data step size 331 . The data step size 331 Refers to the difference between the data address corresponding to the data read instruction corresponding to the two data operations before and after the data point, that is, the value obtained by subtracting the previous data address from the current data address. Based on the data step size, the possible value of the next data address can be guessed, that is, the current data address plus the data step size is used to obtain the possible value of the next data address.

Returning to FIG. 3A, in the embodiment, the process of establishing a track and prefetching instructions and data is shown in FIG. 1 The examples are similar. The difference is that the track table in this embodiment is a compressed track table. Since only some of the instructions in an instruction block are branch instructions or data read instructions, it is possible to track table 107. Compress to reduce the track table 107 The need for storage space. For example, the compressed track table may have the same row as the original track table, but the number of columns is less than the original track table, and a mapping table stores the correspondence between the rows in the compressed track table and the rows in the original track table. Each entry in the compressed track table is a branch point or a data point, and corresponds to the corresponding branch instruction and the data read instruction in the order of the instruction block. The entries in the mapping table are in one-to-one correspondence with the branch points and the data points in the compressed track table, and store the offsets of the corresponding branch points and data points in the instruction block. In this way, after a block instruction or a data read instruction is converted into a column address by the intra-block offset in the instruction block in which it is located, the branch instruction in the compressed track table is found according to the column address. Find the corresponding branch point in the row pointed to by BNX, or find the BNX in the compressed track table that is read by the data according to the column address. Find the corresponding data point in the pointed row; for any branch point or data point in the compressed track table, you can also find the branch instruction or data read instruction corresponding to the branch point or data point in the corresponding entry of the mapping table. The offset within the block, and with the branch point or data point itself BNX points to the corresponding branch instruction or data read instruction in the instruction memory 103 of the branch point or data point.

After the track table compression technique described in this embodiment is adopted, each entry in the compressed track table is a branch point or a data point. Therefore, when the tracker 119 Read pointer 121 When the branch transfer of the pointed branch point does not occur, the read pointer 121 is incremented 134 After increasing one, point to the next track point. If the track point is a branch point, the branch target instruction is read as described above and waits for the TAKEN sent by the processor core 101. Signal. If the track point is a data point, the corresponding data is read as described above and is ready for use by processor core 101. Specifically, the data can be stored into a first in first out buffer (FIFO) The processor core 101 is enabled to sequentially acquire data corresponding to each data read instruction in the correct order. Continue to move the read pointer afterwards 121 The above operation is repeated until it points to a branch point, and the branch target instruction is read as described above and waits for the TAKEN signal sent from the processor core 101.

Further, in the present embodiment, when a data point in the track table is pointed by the tracker 119 read pointer 121 a second time, the readout is performed. DBNX is sent to tag memory 127 to read the corresponding block address. The data block address and the DBNY read by the read pointer 121 constitute the data address when the data point was last executed, and are sent to the prediction module. 301 temporary storage. Thus, when the processor core 101 executes the data point, the current data address is sent to the prediction module 301 minus the last data address to obtain the data step size. Prediction module 301 The data step is outputted back to the corresponding data point, and the data step is added to the current data address to obtain a predicted next data address. Thereafter, the prediction module 301 sends the next data address to the tag memory. 127 matches. If the match is successful, it means that the possible data when the data point is executed next time is already stored in the data memory 113, and the obtained DBNX and the offset address part in the next data address are matched (ie DBNY is stored back in the corresponding data point to complete the update of the data point. If the match is unsuccessful, it means that the possible data when the data point is executed next time is not yet stored in the data memory 113. The next data address is sent to the external memory to obtain a data block including the corresponding data. At the same time, an available entry is allocated in the tag memory 127, and the data block is filled into the data memory 113. Corresponding to the storage block, and outputting the DBNX and the offset address (ie, DBNY) of the data in the data block as the track point content is stored in the data point, thereby completing the update of the data point. So when the tracker 119 Read Pointer 121 When pointing to the data point again, the corresponding data can be read out from the data memory 113 in advance according to the DBN therein for the processor core. Read. The subsequent operation is the same as described in the previous embodiment.

Thus, as long as the data read command is not replaced by the instruction memory 103, the slave processor core 101 The third execution of the data read command begins, and the data may be ready. If the data is correct, the performance loss caused by the lack of data cache is completely avoided, and the time required to read the data cache can be partially or completely masked. Even if the data is wrong, the processor core 101 can also reacquire the correct data without increasing the waiting time.

It should be noted that in the present embodiment, since the tracker 119 reads the pointer 121 while moving to the current processor core 101 During the first branch point after the instruction being executed, multiple data points may pass, and data is read in advance from the data memory 113 based on the DBN in these data points, so a FIFO is used. The data corresponding to each data read instruction is temporarily stored in sequence for the processor core 101 to use sequentially, that is, the FIFO is used to store data to be used by the processor core 101. However, it is also possible to use a FIFO The DBN read from these data points is stored, and the corresponding data is read from the data memory 113 based only on the DBN that was read the earliest, and is acquired from the FIFO after the processor core 101 acquires the data. The first DBN read at that time reads the corresponding data from the data memory 113 for use by the processor core 101, that is, the FIFO is used to store the processor core. The address corresponding to the data to be used. At this time, other operations of the cache system of the present invention are the same as those described in the previous embodiments, and details are not described herein again.

Please refer to FIG. 3C, which is another embodiment of the cache system according to the present invention. This embodiment and FIG. 3A The embodiment is similar in that the difference is that a sequence table 361 is added. The entries of the sequence table 361 are in one-to-one correspondence with the entries of the tag memory 127, wherein the tag memory 127 is stored. The position information PREV of the previous data block of the data block address in the entry and the position information NEXT of the next data block. For example, when the address memory is directed to the data memory 113 When two consecutive data blocks are filled in, the previous data block stores the DBNX of the next data block in the NEXT in the corresponding entry of the sequence table 361. And the latter data block is in the sequence table The DBNX of the previous data block is stored in the PREV in the corresponding entry of 361. Thus, according to the information recorded in the sequence table 361, the predicted next data address can be directly found. DBNX to reduce the number of matches in tag memory 127.

Specifically, assuming that the length of one data block is N, the block address of the next data block is the block address of the current data block plus N. And the block address of the previous data block is the block address of the current data block minus N. Since the next data address is equal to the sum of the current data address and the data step, the absolute value of the sum of the data step and the offset address in the current data address is divided by N. , you can get the number of data blocks between the next data address and the current data address. At the same time, according to the symbol of the data step, it can be determined whether the next data address is a data block before the current data address or a data block after the current data address.

Specifically, when the data step size and the offset address in the current data address are less than N and greater than or equal to '0 ', the next data address is in the same data block as the current data address, that is, the DBNX of the next data address is the same as the DBNX of the current data address.

When the sum of the data step size and the offset address in the current data address is less than '0 ', the next data address is located in the data block before the current data address; when the sum of the data step size and the offset address in the current data address is greater than or equal to N The next data address is located in a data block after the current data address. For both cases, the number of blocks between the next data address and the current data address is equal to the absolute value of the sum of the data step and the offset address in the current data address. N get the quotient. Thus, as long as sufficient information is recorded in the sequence table 361, the DBNX given along the PREV (or NEXT) can be started from the entry corresponding to the current data address. , one by one through the forward (or backward) of each adjacent data block, find the DBNX corresponding to the next data address.

In particular, in many types of loop code, the absolute value of the data step size is small, and the next data address tends to point to the previous (or next) data block of the current data address. In this case, the sequence table corresponding to the current data address The 361 entry (that is, the sequence table 361 is stored in the PREV (or NEXT) in the table pointed to by the tracker 119 read pointer 121 read from the data point) DBNX is the DBNX corresponding to the next data address. Thus, the DBNX storage back track table can be read directly from the sequence table 361 107 Medium, thereby avoiding the matching of the next data address in the tag memory 127.

In addition, an improved data cache structure can be used for better performance gains. In this specification, pairs are grouped together (way-set Associative) illustrates the improvement of the underlying cache. For direct mapping caches, you can think of them as a way group for group associative caches ( Way-set ), implemented in the same way, will not be specified here. Fully associative ) Cache, the address between each memory block can be completely unconnected, so you can directly use Figure 3C The sequence table in the embodiment constitutes a connection between the storage block and the storage block, so that the storage block position (ie, DBN) corresponding to the next data address can be directly found according to the current data address and the data step.

In the traditional group associative cache structure, the data address is divided into three parts: label (TAG), index number (index And the offset within the block (offset ), and the index number of the storage block in each way group is continuous, that is, each index number exists in any one of the way groups and exists only once. At this point, the method of the present invention can be used to give the same tag to all memory blocks in each way group. Moreover, since the index numbers of all the storage blocks in the path group are consecutive, the data blocks of consecutive addresses are stored. In this way, the positional relationship between the storage blocks corresponding to the consecutive addresses is naturally formed, that is, within the range of one way group, the physical position (or index number) corresponding to the data blocks consecutive to the data address is also continuous, so that the prediction can be directly found. Corresponding to the next data address DBNX to reduce the number of matches in tag memory 127 or the latency of looking up the sequence table one by one.

However, in some programs (such as circular access to arrays), the data addresses used are not contiguous, but appear as an arithmetic progression, so the data corresponding to many index numbers in each way group may be always Will not be accessed. Once the frequently accessed data is concentrated in several index numbers, it will be replaced due to insufficient path groups, which will reduce the performance of the cache system. According to the technical solution of the present invention, a compression ratio can be set for each road group, so that the index numbers in the road group are no longer incremented by one, but are incremented by a constant, so that the vast majority of the entire road group The data is the data that will be accessed, and the utilization of the way group is improved as much as possible while still having data continuity.

Please refer to Figure 4A It is an embodiment of the improved group associative cache of the present invention. Each path group cached in this embodiment corresponds to a feature entry in which a compression ratio and a number of pointers are stored. Here, the value of the compression ratio is defined to be equal to the difference between the data block addresses corresponding to two consecutive memory blocks in the way group divided by the data block length. The plurality of pointers point to the path group in which the last data blocks of the consecutive addresses of the first data block (ie, the data block with the smallest data address) are respectively located in the path group. For all the storage blocks corresponding to the same label in the present invention, since the difference between the data block addresses corresponding to two consecutive storage blocks is equal to the data block length, the compression ratio is ' 1 '. The pointers all point to the way group itself, that is, the last several data blocks of the consecutive addresses of the first data block in the way group are in the local path group. Here, the DBNX corresponding to the data address It consists of the road group number and the storage block number in the road group. For example, if a road group contains 4 memory blocks, the road group number of the road group is '3', and the block numbers of the 4 memory blocks are '0' to '3 respectively. ', then their corresponding DBNX are '30' to '33'. As shown in the road group 401 in Fig. 4A, all the storage blocks correspond to the label '2001', that is, 4 The data block addresses corresponding to the memory blocks are '20010', '20011', '20012', '20013 '. Thus, the index number portion of each data address is equal to the intra-group block number value of the corresponding memory block in the way group. For example, the index number of the data block address '20010' is '0', and the corresponding internal number of the storage block is also '0'. ';Data block address '20011 ' has an index number of ' 1 ', and the corresponding block number of the corresponding block is also ' 1 'and many more. At this time, if the data step size of each access is less than or equal to the length of one data block, the storage block position corresponding to the current data address may be used (ie, DBNX). And the data step is directly calculated to obtain the next data address corresponding to the memory block is the memory block or its next adjacent memory block. The DBNX corresponding to the next data address is equal to the DBNX plus for this data address. DBNX increments, which are the quotient of the data step size divided by the data block length. For example, if the DBNX corresponding to this data address is '32' (the corresponding data block address is '20012) '), and the data step size is equal to the length of one data block, then the DBNX increment is equal to '1', and the DBNX of the next data address is equal to '32' plus '1', ie get '33 '(The corresponding data block address is '20013'), thus pointing to the correct memory block. Therefore, the DBNX corresponding to the next data address can be obtained without calculating the next data address and performing address matching. Value.

However, if the data step size of each access is equal to the length of two data blocks, storing in this way will result in half of the data blocks in the way group being not accessed, wasting storage space. For this case, you can set the compression ratio to ' 2 ', that is, the difference between the data addresses corresponding to two adjacent storage blocks in the path group divided by the data block length equal to '2'. Please refer to FIG. 4B, which is another embodiment of the improved group associative cache of the present invention. As shown in Figure 4B As shown in 403, all memory blocks correspond to the label '2001', but the corresponding data block addresses are '20010', '20012', '20014', '20016 '. Thus, the index number portion of each data address is equal to the intra-group block number value of the corresponding memory block in the way group multiplied by the compression ratio. For example, the index number of the data block address '20010' is '0'. ', the block number of the corresponding memory block is '0'; the index number of the data block address '20012' is '2', and the block number of the corresponding memory block is '1' 'Wait, so that the index number is compressed in a compression ratio. In this case, the DBNX increment is equal to the quotient of the data step divided by the data block length divided by the compression ratio. For example, suppose DBNX corresponds to this data address. For ' 31 ' (the corresponding data block address is ' 20012 '), and the data step size is equal to the length of two data blocks, then the DBNX increment is equal to ' 2 ' divided by ' 1 ' and divided by ' 1 '(ie equal to ' 1 '), and the DBNX of the next data address is equal to ' 31 ' plus ' 1 ', which gives ' 32 ' (the corresponding data block address is ' 20014 '), which points to the correct memory block and avoids the calculation and matching of data addresses.

In this embodiment, in addition to storing the compression ratio 419 in the feature table corresponding to each way group. In addition, a number of pointers are stored, the number of which is equal to the compression ratio value multiplied by '2'. Taking the road group 403 as an example, the corresponding feature table item stores the compression ratio '2 In addition to the four pointers are stored. Three of the pointers point to three data blocks adjacent to the address of the first data block (data block address '20010 ') in the way group 403 (ie, the data block address is ' The path group in which the 2000E ', '2000F ', and '20011' data blocks are located, the other pointer points to the next way group adjacent to the path group 403 address (the starting data block address is ' 20018 '). In this way, when the data step size is small, only the DBN corresponding to the current data address is needed. Adding to the data step size and shifting according to the compression ratio, the memory block corresponding to the next data address can be found in the current way group or the path group pointed by the pointer.

For convenience of description, the following describes the case where the data step size is an integer multiple of the data block length. At this time, each time the data address corresponds to DBNY No change. When the data step size is not an integer multiple of the data block length, the extra part needs to be added to DBNY, the sum of the result becomes the new DBNY, and the carry part is added to DBNX. On. Assume that the DBNX corresponding to this data address is '31', then the data step size is 3 data block lengths (that is, the DBNX increment is '3', and the next data address is '20015'. When '), the index number of the data address is first restored according to the compression ratio and the block number of the block of the storage block. For this DBNX, the block number of the block in the block is '1', multiplied by the compression ratio to get '2' '(ie the index number of the data block). Add this ' 2 ' to the DBNX increment ' 3 ' to get the next data address index number ' 5 '. After that, the next data address index number '5 'Compress by compression ratio, ie ' 5 ' divided by ' 2 ' to get the quotient ' 2 ', the remainder is ' 1 '. Therefore, the data corresponding to the next data address is located in the pointer corresponding to the remainder 417 The data in the pointed path group in which the quotient is the block number in the group, that is, the data corresponding to the next data address '20015' is in the storage block 421 in the group group 405 whose block number is '2'.

Similarly, when the data step size (or DBNX increment) is negative, you can use the same method in pointer 411 or 413. The corresponding storage block is found in the pointed way group; when the data step (or DBNX increment) is a positive even number and just exceeds the range of the way group 403, it can be in the pointer 415 Find the corresponding storage block in the pointed path group. When the next data address exceeds the range of the path group pointed by the four pointers, the path group and the storage block corresponding to the next data address may be found through the pointer information stored in the feature table item of the corresponding road group. In addition, for a larger compression ratio, the same method can also be implemented, and details are not described herein again.

According to the technical solution of the present invention, it is also possible to improve the path groups in the group associative cache, so that each road group can be configured as a plurality of groups, and each group can provide the same function as the road group, thereby It is convenient to increase the number of road groups and to store multiple sets of consecutive data blocks corresponding to different labels.

For example, the data store in each way group can be divided into corresponding groups, each group corresponding to the same number of rows of consecutive index numbers, and corresponding to the same tag. That is, several data blocks corresponding to consecutive addresses of the same tag are stored in each group.

Please refer to FIG. 5, which is an embodiment of the data buffer of the packet according to the present invention. Take a road group as an example, the memory 501 It is divided into two groups, each of which contains a row of Content Addressable Memory (CAM), that is, a tag (such as Tag 503 and Tag 505). Accordingly, the data memory 511 It is also divided into two groups, each group containing four storage blocks, and the data block addresses in the four storage blocks are consecutive and correspond to the same label. Specifically, group 513 includes storage blocks 521, 523, 525 and 527, the data block addresses in the four storage blocks are consecutive, and both correspond to the label 503; the group 515 includes the storage blocks 531, 533, 535, and 537 The data block addresses in the four memory blocks are consecutive and correspond to the label 505. In this embodiment, each set of tags and corresponding sets of memory blocks also correspond to a register comparator and a decoder. Such as label 503 corresponding register comparator 517 and decoder 529, tag 505 corresponds to register comparator 519 and decoder 539 . The register comparator includes a register and a comparator. The register stores the upper part of the index number in the start address of the data block in the set of stores.

When addressing according to the data address, the upper part of the index number in the data address passes through the bus 543 And sent to all registered comparators to compare with the stored index number upper part value, and according to the comparison result, only the comparison line of the content addressing storage line corresponding to the matching success item is charged, and the bus 541 is charged. The sent tags match and the enable address is output to the decoder by the successfully addressed content addressed memory line. The decoder is connected to the bus 545 under the control of the enable signal of the register comparator output. The lower part of the index number in the upper data address is decoded, and an output is selected from the corresponding group of data blocks according to the decoding result. Thus, from the data memory 511 by registering the matching, decoding and addressing of the comparator and the decoder. The data block whose index number is the same as the index number in the data addressed address is read out. If all the comparators match unsuccessfully, or all the content-addressed memories participating in the match are unsuccessful, the data corresponding to the data address is not yet stored in the cached way group. By doing the same operation on all the way groups in parallel in the same way, you can find the required data in the cache, or get the result of the cache miss. In this way, each group can provide the equivalent of a road group.

In this embodiment, it is only necessary to store the corresponding index number high value in the register in the register comparator, so that the buffer can be regrouped to obtain different numbers of groups, each group can provide the equivalent of one road group. Features. For example, consecutive index number high value values may be stored in two adjacent register comparators such that the index numbers corresponding to the two register comparators are also continuous. Thus, the adjacent two groups are merged into one larger group to accommodate the data blocks of consecutive addresses.

Further, in the present invention, each group can also be configured to be different in size to form a cache of the hybrid structure. For example, one road group in the cache may be configured into four groups, and the other road group may be configured into one group, and the two road groups constitute a cache portion of the continuous location storage; the other road groups are configured into a traditional form group. The associative structure constitutes the cache portion of the random location storage. In this case, the first road group contains a maximum of four consecutive data blocks, and the second road group contains only one continuous data block. The remaining path groups, like the existing group associative cache, may each contain a maximum number of tags equal to the number of corresponding memory blocks (and the number of rows of the path group itself), and adjacent memory blocks may correspond to different tags. With the cache thus configured, data of consecutive data addresses (ie, the same tags) can be stored in the cache portion of the continuous location storage according to the characteristics of the program. For data with discontinuous data addresses, it is stored in the cache portion of the random location storage. In this way, the cache of the hybrid structure can be configured according to the characteristics of the program, which has the flexibility of data storage in the cache and the convenience of replacement, and can save a large number of label comparison operations when performing data access of consecutive addresses.

It should be noted that, in the actual running of the cache of the above-mentioned hybrid structure, it is sometimes found that the current or upcoming data should belong to the cache portion of the continuous location storage, but the data block in which it is stored has been stored in the cache portion of the random location storage. . At this time, the data block in which the data is located should be filled into the cache portion of the continuous location storage, and the corresponding storage block in the cache portion of the random location storage is invalidated. It is sometimes found that the data to be accessed should belong to the cache portion of the random location store, but the block in which it resides is already stored in the cache portion of the contiguous location store. At this time, the data is stored in the cache portion stored in the continuous position without directly changing the position where the data is stored in the cache.

In the present invention, a data access engine is used to implement the following functions. That is, before the processor core calculates the data address, the data access engine populates the corresponding data into the data cache and prepares the data for use by the processor core. In the present specification, the data reading is taken as an example for description, and the data storage may be implemented by a similar method, and the description will not be repeated here.

The data access engine is described in detail below through several specific examples. Please refer to Figure 6 It is an embodiment of the data access engine of the present invention. For ease of description, only some of the modules or components are shown in Figure 6. In Figure 6, data memory 113 and processor core 101 The same as described in the previous embodiment. The data point format in track table 107 contains instruction types 621 , DBNX , DBNY 627 , and data step size 629 . Where DBNX consists of a group number ( GN ) 623 and the block number 625 in the group, DBNY 627 is the intra-block offset (offset) in the data address. Data Engine 601 contains sequence table 603, shifter 605, 607 and 609, adder 611, subtractor 613 and selectors 615, 616, 617.

In the present embodiment, the intra-group block number 625 in the data point contents read from the track table is sent to the shifter 605. Move left to the adder 611 according to the compression ratio. Since the shift of the block number 625 to the left by n bits is equivalent to the block number 625 multiplied by 2n in the group, the block number in the group after shifting by the shifter 605 625 is restored to the value of the index number in the corresponding data address. In addition, DBNY 627 in the data point contents is sent directly to adder 611, with shifter 605 The output index number together constitutes an input to adder 611, and the data step size 629 in the data point content is adder 611 The other input, the sum of the two is the index number and the intra-block offset in the next data address. The intra-block offset is directly used as the DBNY corresponding to the next data address, and the index number is shifted by the shifter according to the compression ratio. 607 Right shifts to the block number corresponding to the next data address. Here, the number of bits shifted by the shifter 607 to the right is the same as the number of bits shifted to the left by the shifter 605, and the index number in the data address is shifted right by n The bit is equivalent to the index number divided by 2n, so after the shifter 607 is shifted, the index number in the data address is again compressed into the corresponding intra-group block number and sent back to the track table storage, and the lowest n bits thereof Moved out to the right 631 is not part of the block number within the group.

In the process, the portion of the index number shifted out by the shifter 607 is sent to the selector 616 as a control signal, and the adder The overflow signal (carry or borrow) of 611 is sent to selector 615 as a control signal. Each input of the selector is derived from the group number pointed to by the group number 623 in the current data address in the sequence table 603 GN .

Please refer to FIG. 7A, which is an embodiment of the sequence table and data cache of the present invention. Number of rows and data memory of sequence table 603 The number of groups in 701 is the same, and the two correspond one-to-one. In the present embodiment, the data memory 701 is divided into two road groups (i.e., road group 0 and road group 1), and each road group can be further divided into two groups. Therefore, data storage There are four groups in 701. The group numbers are marked on the corresponding groups as shown in Figure 7A, that is, road group 0 contains group 00 and group 01, and road group 1 contains group 10 and group 11 . Further, for convenience of explanation, it is assumed that each group contains four memory blocks, each of which contains four data (or data words).

Accordingly, there are also four rows in the sequence table 603, corresponding to groups 00, 01, 10, and 11 from top to bottom. . Each row contains a feature entry, a tag entry 715, and an index entry 717. The feature table item further includes a compression ratio 703 and five pointers (ie, pointers 705, 707, 709, 711 and 713). The five pointers are shown in Figure 4B. A pointer in an embodiment feature entry points to a group in which each data block adjacent to the first data block address in the group is located. In this embodiment, the index numbers of the data blocks in each group are not compressed, and therefore, except that one pointer points to the previous group of consecutive addresses of the group, and the other pointer points to the subsequent group of consecutive addresses of the group, The other three pointers point to this group. As shown In the first row of the 7A sequence table 603 (corresponding to the group '00'), the pointers 705, 707, and 709 point to the group itself (ie, group '00'), pointer 711 Pointing to the next group of consecutive addresses of the group (ie group '10'), pointer 711 points to the previous group of consecutive addresses of the group (ie group '11 '). The pointers in the other lines are also as shown, wherein the pointer whose content is empty indicates that the group to which it is pointed is not displayed in the figure, or has not been determined, regardless of the case described in the embodiment.

In this embodiment, the compression ratios of the four groups are '0. ', that is, the index number of the data address corresponds to the block number in the group, and each group corresponds to a complete label. At this time, when searching for the corresponding group according to the data address, directly masking the two index numbers (such as the index number in FIG. 7A) As shown by the underscore in 717, the pair of data addresses can be matched to find the group corresponding to the data address; and the two bits that are masked are the block numbers of the corresponding group in the group.

As the data address is incremented, successive four data A, B, C, and D are sequentially accessed as shown in FIG. 7A. . That is, data A and B are the last two data of the last memory block of group '11', while data C and D are group '00 The first two data of the first memory block, that is, the difference between the data addresses of the four data is the data step '1'. As described in the previous embodiment, the data point contents are from the data memory 701 according to the track table 619. In the process of obtaining data A, the DBNX, DBNY and data step sizes in the data point are read out, and the value of DBNX is '1111' (ie 4th in group '11' Storage block), where the group number is '11', the block number in the group is '11'; the value of DBNY is '10' (ie the third data in the storage block); the value of the data step is '1' '(ie the next access data B is the last data of data A).

According to the technical solution of the present invention, the intra-group block number ('11') in the DBNX is sent to the shifter 605. The The group number in DBNX is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the fourth line in the sequence table 603). Where the compression ratio (' 0 ') is sent to the shifters 605 and 607 As the number of shift bits (ie no shift). The output '11' of shifter 605 is combined with DBNY (' 10 ') to form '1110' and data step '1' to get '1111 ', where the block number '11' in the group is still '11' after the output of the shifter 607, that is, the block number ('11') and DBNY ('11) corresponding to the next data address are obtained. ').

Meanwhile, the respective pointer values of the fourth row in the sequence table 603 correspond to the ports '1', '2', '3', '4, respectively. The 'and' -1 ' outputs are sent to selectors 616 and 615 respectively, and the port ' 0 ' outputs the corresponding group number of the line ' 11 '(This group number corresponds to the line number, so it is not necessary to occupy the writable memory in the line, but to hard-code the read-only way to save storage space) to the selector 615. Due to the adder 611 There is no overflow (that is, there is no carry when adding), so the group number '11' outputted by port '0' is selected as the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '11 'and the block number '11') and DBNY ('11') are generated and point to data B in data memory 701. The DBN is written back to the track table via bus 649 619 Within this data point, it is used for the next read of data B.

For example, in the process of obtaining data B according to the technical solution of the present invention, the group number '11' in the data point and the block number in the group are 11 ', DBNY '11' and data step '1' are read again. The block number '11' in the group is output by the shifter 605 and forms '1111' and data step size together with DBNY. 1 'Additions get '0000' (ie the block number '00' and DBNY '00' in the group corresponding to the next data address) and overflow to get the carry '1'. Similarly, the sequence table 603 The respective pointer values of the fourth row and the corresponding group number of the row itself are sent to selectors 616 and 615, respectively. At this time, since the adder 611 has a carry '1', the port '4 is selected. 'Output group number ' 00 ' is the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '00' and group block number '00'), DBNY (' 00 ') is generated and points to data C in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for reading data C next time. Use. According to the above method, the DBN corresponding to the next data address can be calculated according to the data step size when the compression ratio is '0'.

According to the technical solution of the present invention, when the data step size is greater than or equal to twice the data block length, the index number in the data address can be compressed. Table 1 Some commonly used compression ratios and corresponding shift bits, masks (or masks), etc. are shown. In Table 1 The first column shows the range of the data step size; the second column shows the case where the label and the index number stored in the sequence table match the mask bit, where T is the label, I Indicates the block number within the group, the underlined part indicates the masked bit; the third column shows the corresponding shift bit number; the fourth column shows the corresponding compression ratio.

Specifically, in the first row, the data step size is less than twice the data block length, so it is not compressed. At this time, only the index number is masked, and the shift bit number is '0. ', compression rate ' 1 '(ie not compressed). In the second line, the data step size is greater than or equal to twice the length of the data block and less than four times the length of the data block, so it can be compressed. At this time, the lowest bit of the mask tag and the high bit of the mask index number are masked, and the number of shift bits is '1. ', the compression ratio is '2'. In the third line, the data step size is greater than or equal to four times the length of the data block and less than eight times the length of the data block, so it can be compressed. At this time, the lowest two bits of the label are masked, and the number of shift bits is '2. ', the compression ratio is '4'. In the fourth line, the data step size is greater than or equal to eight times the length of the data block and less than sixteen times the length of the data block, so it can be compressed. At this time, the lowest second and third two bits of the label are masked, and the number of shift bits is ' 3 ', the compression ratio is '8'. For other situations, this can also be the case.

Table 1

Data step size	Shield bit	Shift digit	Compression ratio
<2 X	TTTTTXX	0	1
2X < 4X	TTTTXX I	1	2
4X < 8 X	TTTXX II	2	4
8X < 16 X	TTXXT II	3	8

Please refer to FIG. 7B, which is another embodiment of the sequence table and data cache of the present invention. Figure 7B The structure of each group and sequence table in the cache is the same as in Figure 7A. However, in this embodiment, the compression ratio is '01', and the data step size is an integer multiple of the data block length (the data step is in the form of two's complement) 11000 ', which is the decimal '-8'). For example, the lowest bit of the data address index number corresponding to each memory block of group '00' and group '01' is '0', while group '10' and group ' The lowest bit of the data address index number corresponding to each memory block of 11 ' is '1'. At this time, when searching for the corresponding group according to the data address, the mask bit is based on the compression ratio (' 1 ') is shifted one bit to the left, masking the upper digit of the index number in the data address and the lowest digit of the label (as in Figure 7B, label 715 and index number 717) The underline is shown in the middle). That is, matching the portion of the data address other than the lowest bit and the lowest bit of the index number to find the group corresponding to the data address; and the two bits that are masked are the corresponding group of the data address in the group. Inner block number. In this embodiment, the sequence table In 603, the label value in the row corresponding to the group '00' is '1000', and the label value in the row corresponding to the group '01' is '1010', and the one that is masked is '0'. ', indicating that the group boundaries of the data blocks stored in the two groups are aligned. Further, the labels of the group '00' and the group '01' are consecutive except for the lowest one and the lowest digit of the index number is the same, that is, the group '00 The label and index number of the data block stored in ' are '100000', '100010', '100100' and '100110'; group '01 The labels and index numbers of the data blocks stored in ' are '101000', '101010', '101100' and '101110', respectively.

Similarly, access the four data E, F, G, and H with the same data step size as shown in Figure 7B. As an example, where data E and F are the second data in the second and first memory blocks of group '01', respectively, and data G and H are group '00, respectively. The second data in the fourth and third memory blocks, that is, the difference between the data addresses of the four data is the data step '11000'. As described in the previous embodiment, in accordance with the track table 619 During the process of obtaining the data E from the data memory 701, the DBNX, DBNY and data step sizes in the data point are read out, and the value of DBNX is '0101. ', where the group number is '01', the block number in the group is '01'; the value of DBNY is '01' (ie the second data in the block); the value of the data step is '11000'.

According to the technical solution of the present invention, the intra-group block number ('01') in the DBNX is sent to the shifter 605. The DBNX The group number '01' in the middle is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the second line in the sequence table 603). Where the one of the index numbers 717 that is not masked is used for the shifter 605 The rightmost complement is shifted to the left; the compression ratio (' 01 ') is sent to shifters 605 and 607 as shift bits (ie, shifted by one). Thus, shifter 605 pairs the input '01 'Shift one bit to the left and fill the complement ' 0 ', get ' 010 ' with DBNY (' 01 ') together with ' 01001 ' and data step ' 11000 ' to get ' 00001 ', where the block number '000' in the group is shifted to the right by shifter 607 and output '00', which is the block number ('00') and DBNY (' in the group corresponding to the next data address. 01 ').

At this time, since the adder 611 does not overflow (that is, there is no borrow when subtracting), the group number of the port '0' output is selected. 'As the group number corresponding to the next data address. At this point, DBNX corresponding to the next data address (ie group number '01 ' and group block number '00'), DBNY (' 01 ') is generated and points to data F in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for use in the next read data F.

For example, in the process of obtaining the data F according to the technical solution of the present invention, the group number '01' in the data point and the block number '00 in the group. ', DBNY '01 ' and data step '11100' are read again. The block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 0 ' and then with DBNY Together, '00001' and data step '11000' are added to get '11001' (that is, the block number corresponding to the next data address is '110' and shifted to the right by '11' and DBNY ' 00 '), and a borrow overflow occurred. Therefore, the group number '00' output by the port '-1' is selected as the group number corresponding to the next data address. At this point, the DBNX corresponding to the next data address (ie group number '00' and group block number '11'), DBNY ('01') are generated and point to data G in data memory 701. The DBN via bus 649 It is written back to the data point in the track table 619 for the next read data G. According to the above method, the compression ratio is not '0. ', but when the data step size is an integer multiple of the data block length, the DBN corresponding to the next data address is calculated according to the data step size.

Please refer to FIG. 7C, which is another embodiment of the sequence table and data cache of the present invention. Figure 7C The structure of each group and sequence table in the cache is the same as in Figure 7B. However, in this embodiment, and the data step size is not an integer multiple of the data block length (the data step is '1001', that is, the decimal '9' '). In this embodiment, the labels of the group '00' and the group '01' are consecutive except for the lowest one and the lowest digits of the index number are the same, the group '01' and the group '11 The 'label' is identical except for the lowest one and the index number is the lowest. That is, the label and index number of the data block stored in the group '00' are '100000', '100010', ' 100100 'and '100110'; the labels and index numbers of the data blocks stored in group '01' are '101000', '101010', '101100' and ' 101110 '; The label and index number of the data block stored in the group '11' are '101001', '101011', '101101' and '101111 respectively '.

For example, access the four data J, K, L, and M with the same data step as shown in Figure 7C, where data J Is the second data in the third data block of group '00', data K is the third data in the fourth data block of group '00', and data L is the group '10 The fourth data in the first data block, the data M is the first data in the second memory block in the group '11', that is, the difference between the data addresses of the four data is the data step size' 1001 '. As described in the previous embodiment, in the process of obtaining data J from the data memory 701 based on the contents of the data points in the track table 619, DBNX, DBNY in the data point And the data step size is read out, and the value of DBNX is '0010', where the group number is '00', the block number in the group is '10'; the value of DBNY is '01'; the value of the data step is ' 1001 '.

According to the technical solution of the present invention, the intra-group block number ('10') in the DBNX is sent to the shifter 605. The DBNX The group number '00' in the sequence is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the first line in the sequence table 603). Where the one of the index numbers 717 that is not masked is used for the shifter 605 The rightmost complement is shifted to the left; the compression ratio (' 01 ') is sent to shifters 605 and 607 as shift bits (ie, shifted by one). Thus, shifter 605 pairs the input '10 'Shift one bit to the left and fill the complement ' 0 ', get ' 100 ' and DBNY (' 01 ') together with '10001 ' and data step '1001 ' to get '11010 ', where the block number '110' in the group is shifted by one bit by the shifter 607 and then output '11', that is, the block number ('11') and DBNY ('10) corresponding to the next data address are obtained. ').

At this time, since the adder 611 does not overflow (that is, there is no carry when adding), the group number '00 of the port '0' output is selected. 'As the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '00' and group block number '11'), DBNY (' 10 ') is generated and points to the data K in the data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for use in reading data K next time.

For example, in the process of acquiring the data K according to the technical solution of the present invention, the group number '00' in the data point, the block number in the group '11 ', DBNY '10' and data step '1001' are read again. The block number '11' in the group is shifted to the left by one shifter 605 and complemented by '0' and then with DBNY. Together, the combination of '11010' and the data step '1001' yields '00011' (ie, the block number in the group corresponding to the next data address is '000' and is shifted to the right by '00' and DBNY ' 11 '), and a carry overflow occurs. Therefore, select the group number '01' output by port '4' as the group number corresponding to the next data address. At this point, the DBNX corresponding to the next data address (ie group number '01' and group block number '00'), DBNY ('11') are generated and point to data L in data memory 701. The DBN via bus 649 It is written back to the data point in the track table 619 for the next read data L.

For example, in the process of obtaining the data L according to the technical solution of the present invention, the group number '01' in the data point and the block number '00 in the group. ', DBNY '11' and data step '1001' are read again. The block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 0 ' and then with DBNY Together, '00011' and the data step '1001' are added to get '01100' (that is, the block number corresponding to the next data address is '011' and the right one is shifted to '01' and DBNY ' 00 '). Here, although the carry overflow does not occur, since the value of the shifted portion 631 on the right side of the shifter 607 is '1', and the complement '0 in the index number 'Inconsistent, so the selector 616 is selected based on the removed portion 631. That is, the group number '11' output by the port '1' is selected and passed through the selector 615. After selection, it is the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '11' and group block number '01'), DBNY (' 00 ') is generated and points to data M in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for reading data M next time. Use. According to the above method, if the compression ratio is not '0' and the data step is not an integer multiple of the data block length, the DBN corresponding to the next data address is calculated according to the data step.

Further, in the present invention, the index number corresponding to the first data block of each group may not be '0. ', in order to achieve a data storage method in which the group boundaries are not aligned, the data is flexibly stored, thereby saving storage space. Please refer to FIG. 7D, which is an embodiment of a data storage manner in which the group boundaries are not aligned according to the present invention. Figure The structure of each group and sequence table in the 7D cache is the same as in Figure 7A. However, in this embodiment, the compression ratio is '10', and the data step size is not an integer multiple of the data block length (the data step size is '10001') ', which is the decimal '17'). For example, the lowest bit of the data address index number corresponding to each memory block of group '00' and group '01' is '00', while group '10' and group '11 The lowest bit of the data address index number corresponding to each memory block is '01'. At this time, when searching for the corresponding group according to the data address, the mask bit is based on the compression ratio (' 10 ') is shifted to the left by two, masking the lowest two digits of the label in the data address, and does not mask the index number in the data address (as in Figure 7D, label 715) The underline is shown in the middle). That is, the part of the data address except the lowest two bits and the index number are matched to find the group corresponding to the data address; and the two bits that are masked are the block numbers of the corresponding group in the group. . In this embodiment, the sequence table In 603, the label value in the row corresponding to group '00' is '1000', and the label value in the row corresponding to group '01' is '1100', and the two blocks that are masked are '00'. ', indicating that the group boundaries of the data blocks stored in the two groups are aligned; and the label value in the row corresponding to the group '11' is '1101', and the two blocks that are masked are '01 ', indicating that the group boundaries of the data blocks stored in the group are not aligned, and the group boundary offset is '01'. Further, group '00' and group '01 The label of 'the last two digits is consecutive and the index number is the same, and the labels of the group '01' and the group '11' are identical except for the lowest two digits and the index numbers are consecutive. Ie, group '00 The label and index number of the data block stored in ' are '0000000', '0100100', '0101000' and '0101100'; group '01 The label and index number of the data block stored in ' are '0110000', '0110100', '0111000' and '0111100'; for group '11 ', because the group boundaries are not aligned, and the offset is '01', the labels and index numbers of the data blocks stored therein are '0110101', '0111001', '0111101 'and ' 1110001 '.

In addition, since the group boundaries of the group '11' are not aligned, they are based on the bus 641. When the part other than the lowest two digits and the index number of the transmitted data address match the group, the lowest two digits of the label in the data address also need to be stored in the sequence table 603 by the bus 643. The lowest two bits of the tag stored in the corresponding row are subtracted by the subtractor 613 to determine the block number within the group corresponding to the data address. For example, if the data address is '011011011' (ie the label is '01110' ', the index number is '01', the offset within the block is '11'), according to the label except the lowest two digits (' 011 ') and the index number ' 01 ' can be matched to the group ' 11 '. The lowest two digits of the label (' 10 ') are subtracted from the lowest two digits (' 01 ') of the label stored in the row corresponding to the group '11' in the sequence table 603 by the subtractor 613, resulting in '01 '(Second data block), ie the data address corresponds to the last data of the second data block of group '11'.

For example, access the four data P, Q, R, and S with the same data step as shown in Figure 7D, where data P Is the second data in the third data block of group '00', data Q is the third data in the fourth data block of group '00', and data R is the group '10 The fourth data in the first data block, the data S is the first data in the first memory block in the group '11', that is, the difference between the data addresses of the four data is the data step size' 10001 '. As described in the previous embodiment, in the process of obtaining data P from the data memory 701 based on the contents of the data points in the track table 619, DBNX, DBNY in the data point And the data step size is read out, and the value of DBNX is '0010', where the group number is '00', the block number in the group is '10'; the value of DBNY is '01'; the value of the data step is ' 10001 '.

According to the technical solution of the present invention, the intra-group block number ('10') in the DBNX is sent to the shifter 605. The DBNX The group number '00' in the sequence is sent to the sequence table 603 to read the contents of the corresponding line (i.e., the first line in the sequence table 603). The two bits that are not masked in index number 717 are used for the shifter 605. The rightmost complement is shifted to the left; the compression ratio ('10') is sent to shifters 605 and 607 as shift bits (ie, shifted by two). Thus, shifter 605 pairs the input '10 'Shift left by two and fill the complement '00', get '1000' and DBNY ('01 ') together with '100001' and data step '10001' to get ' 110010 ', where the block number '1100' in the group is shifted by two bits and then output '11' by the shifter 607, that is, the block number ('11') corresponding to the next data address and the DBNY are obtained. (' 10 ').

At this time, since the adder 611 does not overflow (that is, there is no carry when adding), the group number '00 of the port '0' output is selected. 'As the group number corresponding to the next data address. At this point, the next data address corresponds to DBNX (ie group number '00' and group block number '11'), DBNY (' 10 ') is generated and points to data Q in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for use in reading data Q next time.

For example, in the process of obtaining the data Q according to the technical solution of the present invention, the group number '00' in the data point, the block number in the group '11 ', DBNY '10' and data step '10001' are read again. The block number '11' in the group is shifted to the left by the shifter 605 and the bit is shifted to '00' and then DBNY Together, '110010' and data step '10001' are added to get '000011' (that is, the block number corresponding to the next data address is '0000' and is shifted to the right by '00. 'and DBNY ' 11 '), and a carry overflow occurs. Therefore, select the group number '01' output by port '4' as the group number corresponding to the next data address. At this point, the DBNX corresponding to the next data address (ie group number '01' and group block number '00'), DBNY ('11') are generated and point to data R in data memory 701. The DBN via bus 649 It is written back to the data point in the track table 619 for the next read data R.

For example, in the process of obtaining the data R according to the technical solution of the present invention, the group number '01' in the data point and the block number '00 in the group. ', DBNY '11' and data step '10001' are read again. The block number ' 00 ' in the group is shifted to the left by one shifter 605 and complemented by ' 00 ' and DBNY Together with the '000011' and the data step '10001' are added to get '010100', the shifter 607 outputs the index number '0101 to the adder 611. 'Shift right to get '01'. Here, although the carry overflow does not occur, since the shifter 607 shifts the index number '0101' output from the adder 611 to the right, the right side shifts out the portion 631 to ' 01 ', does not coincide with the complement '00' in the index number, so the removal portion 631 selects the selector 616. That is, select the port number '11' output by port '1' and pass the selector 615 is selected as the group number corresponding to the next data address. Then read the group boundary offset '01' from the row corresponding to the group '11' in the sequence table 603, and shift the resulting '01 with the shifter 607. 'Subtract the set of boundary offset ' 01 ' to get the true intra-group block number ' 00 '.

At this point, the next data address corresponds to DBNX (ie group number '11' and group block number '00'), DBNY (' 00 ') is generated and points to data S in data memory 701. The DBN is written back to the data point in track table 619 via bus 649 for reading data next time S Use. According to the above method, if the compression ratio is not '0', the data step is not an integer multiple of the data block length, and the group boundary is not aligned, the DBN corresponding to the next data address is calculated according to the data step size. .

According to the technical solutions and concepts of the present invention, combined with Figures 7A, 7B, 7C and 7D For a similar method, a similar operation may be performed for other various packet, compression, or data step conditions, and details are not described herein again.

In the present invention, data that the processor core may load is pre-filled into the cache in advance in the following manner, and the supply processor core is taken out in advance. The present embodiment advances the abstraction of the instruction or instruction that is being executed or is to be executed by the processor core in advance to process the data read or data store instruction (hereinafter, the data read instruction is taken as an example). When processing a data read instruction in the instruction loop for the first time, the start data address of the instruction is determined and recorded according to the data address generated by the processor core. When the same data read instruction in the instruction loop is processed for the second time, the second data address generated by the processor core is subtracted from the start data address of the same data read instruction in the record to obtain the data read. The difference between the data addresses of the two adjacent executions of the instruction is recorded as the data step size. The data step is added to the second data address to obtain the next data address and recorded. And the next data address queries the high-level memory for the data. If the data is not in the high-level memory, the next data address is retrieved from the low-level memory and filled into the high-level memory.

In the future, each time the same data read command is encountered, the next data address corresponding to the instruction is extracted from the record and provided to the processor core for use. At the same time, the next data address is compared to the exact data address provided by the processor core as needed. If there is no error, and the next data address is added to the data step, the new next data address is obtained and recorded. And the new next data address is used to query whether there is any data in the high-level memory. If the data is not in the high-level memory, the corresponding data is obtained from the low-level memory with the new next data address and filled into the high-level memory. If an error is found in the comparison, the correct address at the time of the error is used as the starting data address and is re-executed as described above.

Please refer to FIG. 8A, which is an embodiment of the data access engine of the present invention. Figure 8 shows Figure 6 A more complete embodiment based on the embodiments. The processor core 101 and the data memory (or primary data memory) 113 are the same as those described in the previous embodiment, and the data memory 113 The data in is a subset of the data in lower level memory 115. A first in first out buffer (FIFO) 849 is used as a data buffer between the data memory 113 and the processor core 101. Label 841 Together with data store 113, it forms a traditional way group cache. Sequence table 603 in data access engine 801, shifters 605, 607, and 609, adder 611, subtractor 613 and selector 617 are the same as the corresponding function blocks in data access engine 601 in FIG. For convenience of explanation, the selector 618 in this embodiment includes the selector in the embodiment of FIG. 615 and 616. In addition, the sequence table 603 and the selector 618 increase the storage and selection of the group numbers corresponding to more adjacent groups, and the group valid bits and the index bit valid bits are also added to the sequence table 603. Controller 803 Controls the operation of the data access engine. The selectors 811, 813, and 815 are selected from the track table 619 or the subtractors 613, 805 under the control of the controller 803. The intra-group index number, the intra-block offset, and the step size are used by the adder 611, and the shifters 605, 607 calculate the next DBN. Subtractor 613 according to data address 641 in the sequence table 603 Match the resulting 643 to calculate the index number and the offset within the block. The subtractor 805 finds the difference between the memory addresses of the two accesses of the same memory access instruction, that is, the data step (stride). Converter The converter 807 converts the step size into a compression shift signal stored in the sequence table 603, which is used as a shift bit number for controlling the shifter. Current cache address bus 821 The contents of the entry from the track table 619 are sent to each function block. The intermediate result bus 823 sends the block number and the intra-block offset from the subtractor 613 group to the adder 611, and the shifter 605, 607 Calculate the next DBN. Bus 825 sends the data step calculated by subtractor 805 to selector 815. Control signal generated by controller 803 827 control selector 811, 813, 815, 817, 617, and 819. The shift signal 829 output from the sequence table 603 controls the shifters 605, 607 and 609. The next data address bus 881 sends the next data address to the sequence table 603. The corresponding data address is generated from the lower memory 115 to the data memory 113, and the data is pre-populated. The DBN is sent to the track table 619 storage.

The traditional cache is a kind of indirect addressing based on matching. Taking the group-connected cache as an example, the index bits in the middle of the data address are read from the cached tags and then matched with the high bits in the data address. If the labels of a road group match, it is called a hit. The content of this way group is the data address pointed to. Primary data storage 113 It is composed of a plurality of identical memories, each of which constitutes a road group, and each of the road groups has the same number of rows, that is, a multi-path group. Each storage line of each memory is called a primary data block, and each primary data block has an index number ( INDEX) 802, which is determined by the line number in the primary data memory 113 where the primary data block is located. Intra-block offset 627 points to a data item within the block. Please refer to Figure 8B It is a schematic diagram of various address forms described in the present invention. The data address 804 can be divided into high-order tags according to the number of primary data blocks per block group in the primary data memory 113 and the number of data in the block 801 , the middle index bit 802 and the lower block offset 627 ,

In the embodiment, the cache start is also matched with indirect addressing, and after the relationship between the data address and the cache address is established, the cache address is directly addressed. Direct addressing with cache addresses saves tag matching operations, saves power, and increases memory access speed.

According to the technical solution of the present invention, for the cache allocated by the group, the group storage address 808 is divided into the upper memory group address (GN). 623, the middle block number (index) 625 and the lower block intra-block offset (offset) 627. The associated cache, the cache address 806 is divided into the high road group number 814, the index number in the middle of the road group 802 and the low intra-block offset 627 . These two cache addresses are exactly the same as the intra-block offsets in the data address, but the index number or the number of blocks in the group is not necessarily the same. Because the grouping of buffer addresses allocated by group can be smaller than the path component, the number of bits in the group block number of the group storage address may be less than the number of bits of the index number of the group associated cache address, and the corresponding label There will be more digits. Selector 843 selects the group associative cache address generated by the tag 841 or the group storage address generated by the data access engine for the track table 619 to store. The two addresses are of the same form, and the essence is the data memory 113. the address of.

In the present invention, according to the sequence table 603 The content can convert the data address and the cache address to each other. When a data address is to be converted into a cache address, the data address is sent from the bus 641 to the sequence table 603. Matching the tag and the index number, the group number corresponding to the matching entry can be read from the bus 835, the tag and the index number are also read from the bus 643, and the data address on the bus 641 is 643. The label and index number on the label are subtracted by the subtractor 613, which results in the label low bit and the index number and the intra-block offset. The label low and index number are shifted by the shifter 609 After shifting, the intra-group block number of the corresponding cache address is obtained, and the above-mentioned group number, intra-group block number and intra-block offset are combined on the bus 837 to obtain a cache address corresponding to the data address.

When a cache address is to be converted to the corresponding data address, the sequence number table 623 is addressed in the cache address 603. The tag and index number read therefrom are sent via bus 643. The tag and index number are added to the intra-group block number 625, the intra-block offset, 627 in the cache address, and the sum is the data address.

According to the technical solution of the present invention, the data access engine can provide a data address to access the lower layer memory. A cache address can also be provided to access the data store 113. The data access engine also stores the correspondence between the data address and the cache address, and can be converted from one address to another.

In this embodiment, the tracker 845 determines the next track table read address based on the content of the track table 619 data point output 851 . The track table read address 851 is delayed by the delay 847 and is written as the track table address 853.

Please refer to FIG. 8C, which is an embodiment of the sequence table operation of the present invention. Where is the sequence table 603 It consists of a register, a comparator, and a mask register. Each register can also be implemented by a memory. Among them, there is a shift region 891, an adjacent group region 892, a group effective signal 893 with block valid signal 894, label with index number 895, mask 896, and comparator 897. Where the tag and index number and the comparator can be addressed by the content (MEM) ) Implementation. Labels, index numbers, comparators, and masks can be tri-stated CAM (tri-state CAM) )to realise. The mask acts on the lower bits of the tag and the index number, and can selectively make the lower bits of the tag or some bits in the index number not participate in the comparison (ie, the bits of the tags or index numbers do not affect the comparison result), and the data compression storage is implemented. Features. The mask is controlled by the shift region when the shift is ' When 0 ', the mask masks the lowest bit (ie index number); when the shift is '1', the mask moves one bit to the left, masking the lowest bit of the tag and the high bit of the index number, leaving the lowest index number One participated in the comparison. Comparator 897 The data address on the bus 641 is compared with the label, the index number 895 is masked by the mask 896, and the comparison result is sent via the bus 888 for the controller 803 to make a decision basis. Adjacent group area 892 Store the group number and valid digit adjacent to the group for tracking when the data step is stepped over the boundary of the group. Group valid signal 893 When the data is written for the first time in the group, it indicates that at least one data block in the group is valid, and also indicates that the group corresponds to the data pointed to by the address included in the label and the index number segment. Block valid signal 894 Each bit represents the validity of a data block in the group, and the lower bit and the index number in the tag input on the bus 641 can be decoded by the shifted result controlled by the shift region 891 (for example, the 2-bit binary address is decoded as 4 bits, of which only one is valid (one-hot), each representing a data block) to select the block valid signal 894 The data block valid signal in the data block, if the data block valid signal is valid, the corresponding data is already in the corresponding data block of the group in the data memory 113. If it is invalid, the corresponding data needs to be filled into the data block.

The sequence table 603 can be accessed in two ways. One way is via bus 641 in Figure 8A. The data address entered matches the tag in the sequence table 603, and the other mode is directly addressed via the group number 831 or group number 833 in Figure 8A. Sequence table that is matched by the data address or addressed by the group number Each data area in the table entry in 603 can be read or written. For example, the corresponding group number 835 can be read via data address matching, or the corresponding label 643 can be read out via group number 829 addressing. Sequence table Other areas of the 603 that are matched or addressed to the selected entry, such as the adjacent group number, the block valid signal and the group valid signal can be read or written. All regions in the entry are reset to all '0' before being written.

When the data access engine encounters a new data read instruction, it is processed in several stages. The first stage is to process a data read instruction for the first time. This stage determines whether the data read instruction is in a loop. If it is not in the loop, the data read command is executed in each group of the group associative buffer area. The index number in the data address is assigned to the primary data storage The instruction block in a path group in 113, writes the data, and the label corresponding to the index number in the path group 841 Write the label portion of the data address. As in the loop, a group is allocated in the buffer area allocated by the group for the data that the data read instruction may read. In both cases, the data address is mapped to the cache address and stored in the relevant information area of the data read instruction, and the memory is accessed by the data address to provide corresponding data to the processor core. 101.

In this embodiment, whether the data read instruction is located in the loop may be determined according to whether the data read instruction is located between a reverse branch instruction and its branch target instruction. For example, the tracker can provide a pointer to the first inverted branch instruction following the current instruction being executed by the processor core, ie, the branch target address of the branch instruction is less than the address of the branch instruction itself. Thus, all data read instructions located between the current instruction and the branch target instruction with the larger address to the branch instruction are located in the loop formed by the branch instruction. Of course, the tracker pointer can also point to more reverse branch instructions after the current instruction, and determine which data is included in each loop according to the branch target address of each inverted branch instruction. Take instructions.

The second stage is the second processing of the same data read instruction. In this stage, in addition to accessing the memory by data address, the corresponding data is provided to the processor core. 101 In addition, the data step size is calculated according to the difference between the second data address and the first data address (stored when the data read instruction is processed for the first time), and the second data address is added to the data step size. Finding a possible data address of the memory access when processing the data read instruction for the third time, and using the possible data address from the lower layer memory 115 Read the data. Further, a buffer address corresponding to the possible data address is converted, and data from the lower layer memory 115 is filled in the data memory 113 accordingly. . At the same time, the cache address is stored in the relevant information area of the data read command.

The third stage is to process the same data read instruction after the third or third time. At this stage, data is directly supplied from the data memory 113 to the processor core directly from the cache address stored at the previous time. . The data access engine also has a mechanism to compare the data address generated by the processor core 101 with the last stored cache address, and if not, press the processor core. The generated data address retrieves the data and corrects the cache address. In addition, the cache address is added to the data step to obtain the possible cache address for the next load, and the memory is filled by this address. . The new cache address is then placed in the relevant information area of the data read command for future use. After that, the data read command is processed in the same manner as the third time.

The different stages of processing the data read instructions are controlled by controller 803. Track table 619 The buffer address of the data read instruction and the initial value of the data step are both 0 when the track is established. The controller 803 reads the buffer address and data step size of the data read command and the track table address of the data read command. Please refer to the picture 8D, which is an embodiment of the controller of the present invention. The controller 803 has a complex array matching counter whose unit structure is a memory 861, a comparator 862 and a counter 863. As a group, the bit width of the memory 861 and the comparator 862 is equal to the track table address, and the bit width of the counter 863 is two bits. Distributor 864 Responsible for assigning a set of idle match counters to a data read instruction processed for the first time. The initial value detector 865 is used to detect the instruction form, and the buffer address and data step size of all '0'. Another bus 821 The instruction type, cache address and data step are imported into an initial value detector 865 which connects the track table address to the input of the memory of each matching counter group and one port of the comparator. Bus 866 The count value of the counter in the group matching the value stored in the memory (such as 861) with the track table address (address of the current data read command) on the bus 851 is transferred to the control logic 867 Controls the operation of the data access engine at the stage in which the instruction is placed.

There is a loop counter in the distributor 864 whose output is via a decoder 872 Convert to a pointer to each match counter. The register bank counter value pointed to by the pointer is read back to the distributor 864 via bus 869, as comparator 870 finds that the value is not '0'. '(being used by a data read instruction), increments the loop counter count in the allocator 864 by '1', causing the pointer to move to the next matching counter group. When the matching counter group returns a counter value of '0 When the loop counter stops counting, the pointer stays in the group, and the track table address of the next unprocessed data read command will be stored in the memory in the matching counter group. The following assumes that the pointer stays in register 861 The group in which it is located.

When the initial value detector 865 detects a non-data read command, the controller 803 operates in mode 0. In the state. No response to this directive. When a data read instruction with a buffer address and a data step size of '0' is detected, it is judged that the instruction has not been processed, and the first stage mode operation is entered. First, the initial value detector 865 generates a write enable signal 868 that stores the data read instruction track table address on bus 851 into memory 861 in the match count unit pointed to by the pointer of splitter 864. At this point, the memory The value of 861 is the same as the value on bus 851, and the output of comparator 862 is '1', ie the group is the current instruction set. The count of the corresponding counter 862 of the current instruction group is increased by '1' to get '1' ', and the count value is placed on bus 866 and transferred to control logic 867 to set each selector and function block in the data access engine in a first phase mode.

When this data read command is encountered for the second time, the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851. The value of register 861 matches it, and comparator 862 controls counter 863 to increment '1' ', make its count value '2'. The group on the match is called the current instruction group, and the value of the counter in the current instruction group is placed on the bus 866 and transferred to the control logic. It sets each selector and function block in the data access engine in the second stage mode.

When this data read command is encountered for the third time, the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851. The value of register 861 matches it, the group is the current instruction group, and comparator 862 controls the counter 863. Increase '1' to have a count value of '3'. This value is placed on bus 866 and transferred to control logic 867 to set each selector and function block in the data access engine in a third stage mode.

When this data read command is encountered for the fourth time, the initial value detector 865 Detected as a data read command that controls the comparators in each group to compare with the track table address on bus 851. The value of register 861 matches it, the group is the current instruction group, and comparator 862 controls the counter 863. Increase '1' to overflow its count value to '0'. This value is placed on bus 866 and transferred to control logic 867 to set each selector in the data engine in a third stage mode. Control logic 867 The count value '0' or '3' is operated in the default state of the third stage. When the counter counts to '0', its count value no longer increases, causing comparator 862 to no longer participate in the comparison. Count value is '0 'Also allows the unit to be selected by the allocator 864 for use by other data read instructions.

The next time this data read instruction is encountered, but the initial value detector 865 detects the slave bus 821 at the same time. The corresponding buffer address and data step size sent from are not '0', and the comparator and bus in each group are 851. The track table addresses are compared and the results are the same. Based on this, it is judged that this is a data read instruction that has entered the third stage, and the control logic 867 controls the operation of the data engine in the default mode, that is, the third stage mode.

The difference between the feedback signals 888, 889 and subtractor 805 returned from tag 841 and sequence table 603 The 825 is sent back to the control logic 867 in the controller 803, and the control logic 867 is based on these feedback signals with the slave bus 866. The stage information of the current instruction is controlled to control the operation of the data access engine. In some cases control logic 867 Information can also be fed back to the matching counter group to change the stage of the current instruction to handle anomalies. For example, a data read instruction has entered the third stage, such as the predicted data address and the slave processor core via the bus 641. If the data address sent does not match, the control logic 867 will send a feedback signal to the matching count group corresponding to the current instruction, so that the count value is '1. '. Thereafter, the instruction begins execution in the first phase state, undergoes the second, third phase re-establishment step size and the next cache address is stored in the track table 619.

The operation of the data access engine is further illustrated by the actual operation of the data access engine in this embodiment with a data read command. Tracker 845 Control track table read address 851 Move to the next data read instruction, the data read instruction corresponds to the type 621, DBN (623, 625, 627), data step size 629 is placed on the current data address bus 821. Controller 803 reads type 621 recognized as data type, DBN and data step size are all '0 ', judge the instruction is not processed, but still control the selector 617 to send the DBN of all '0' on the bus 821 to the data memory via the bus 861 113 to get the data into the buffer 849 For the memory core to be spared (in addition, it is not necessary to take data with the address of all '0' DBN to save power). At the same time, controller 803 enters the first stage mode and controls selector 817 to bus 821 The group number 623 on the group is sent to the sequence table 603, and the label stored in the entry No. 0 (or the first entry) is selected (if the label is invalid, the output is all '0') is sent to the shift via the bus 643. Adder 812 is added after shifting the block number in the group on bus 821. The data address 641 generated by the processor core 101 and the output of the shift adder 812 are subtracted in the subtracter 613, and the difference is Difference ) is placed on bus 825. The controller 803 obtains the difference from the bus 825 for analysis and judgment, so that the difference is not '0', and the controller judges to perform the operation on 821. The data pointed to by the DBN is not the data required by the processor core 101, that is, the processor core is notified to ignore the corresponding data in the buffer 849, and wait for the correct data (in addition, this judgment may be omitted to save power).

Controller 803 and controls the data address on bus 641 with sequence table 603 and label 841 The tags in the match are matched. If a match is matched in tag 841, it operates as a traditional cache. If all the tags do not match, the data address on bus 641 is sent to lower level memory via selector 819. 115. The corresponding data block is read from the lower layer memory 115. At this time, the tracker 845 I have seen the next branch point forward and judged that the branch is a reverse branch (that is, the program is a loop here), and calculated that the range contains the data read instruction being processed, then assign a data that can be replaced. Group Group ) and specify the data block 0 (or the first data block) in the group to fill the corresponding data block read from the lower layer memory. The label and index number portion on the bus 641 is stored in the sequence table 603 The label and index number area in the corresponding entry. The group valid bits of this group and the valid bits of the corresponding data block (data block 0) are asserted. The shift item in this entry is now all '0 ', the adjacent group number part has not yet had a value.

The address of the entry (i.e., the group number GN) is output from the sequence table 603 via the bus 835 to the bus 837. At the same time, the data address on the bus 641 is subtracted from the subtracted tag 613 by the just-input tag sent from the sequence table 603 via the bus 643, and the difference is placed on the intermediate result bus 823. Because of the bus The address high bits on 641 and bus 643 are the same, and the difference is now the tag low, index and intra-block offset. This low bit is also placed on the bus by the shifter 609 (the shift amount is '0' at this time). The group number on it forms a complete and correct cache address, at which point the controller 803 controls the selector 617 to place the cache address on the bus 837 on the bus 855 and send it to the data store 113 It is pointed out that the correct data block is filled with the corresponding data block read from the lower layer memory 115. The controller 803 also controls this data to be read from the data memory 113, or to control the data directly from the lower layer memory 115. The output is bypassed to data buffer 849 for use by memory 101. Controller 803 then notifies processor core 101 that the correct data is available for use.

The controller 803 also controls the selector block 811, 813 and 815 to select the intra-block number on the bus 823 and the intra-block offset and the full '0' step from the track table in the adder 611 as in the embodiment of FIG. , put the result on bus 881. At this time, the control line 631 generated by the addition result controls the current group number output by the selector 618 selection sequence table 603 to also be placed on the bus 881. The group number, the block number within the group and the intra-block offset are spliced together on the bus 881 into a buffer address DBN. At this time, the controller 803 controls the selector 843 to select the bus 881, and the delay 847 delays the track table read address 851 and puts it on the track table write address 853, so that the DBN is written to the same entry previously read. Controller 803 does not update the step size (or force write '0'), still '0'. After the operation is completed, the track table has the cache address of the read that the data read instruction has been completed in the entry, (hereinafter referred to as DBN ₁ for explanation), and the step size is '0'. At this point, the data access engine completes the first phase of the data read instruction.

As mentioned earlier, the program in this example is executing a loop. When the same data read command as described above is executed again, the type 621, DBN ₁ and the data step '0' are read out onto the bus 821, and the track table address is also on the bus 851. The controller 803 reads the track table read address 851 to match the address stored in the matching count group in the controller 803, and obtains a prompt to perform the second stage operation on the instruction, and the control logic 867 controls the data access engine via the control bus 827 to perform corresponding operating. The controller 803 controls the group number (GN) of the DBN ₁ on the bus 821. 623 selects the tag stored in the corresponding entry from the sequence table 603, and the block number in the group is sent to the selector 810 via the bus 643 and the DBN ₁ from the bus 821. The intra-group block number, the intra-block offset is added by the shift adder 812 (the amount of shift is controlled from the bus 829 output from the sequence table 603). To support the non-aligned labels and indexes stored in the label, index number 895 in the sequence table 603, the intra-group block number and the intra-block offset (lower bit) on the bus 821 need to be shifted in the shift adder 812. It is then added to the tag index number (high) on bus 643. The data address corresponding to DBN ₁ is sent to an input of the subtractor 805. The new data address on bus 641 is sent to the other input of subtractor 805 to be subtracted from the corresponding data address of DBN ₁ , and the resulting difference is placed on bus 825 as a data step (stride). The converter 807 converts the step size into a shift signal region in the corresponding entry of the DBN _{1 in the} sequence shift table 603 by the corresponding shift signal (shift). The shift amount 829 is sent from the sequence table 603 to the shifters 605, 607, 609 and 812 to control the shift operation.

The controller 803 controls the selector 819 to select the data address on the bus 641 to read the corresponding data from the lower layer memory 115. At the same time, the lower address of the data address on the 641 and the corresponding DBN ₁ entry on the 643 and the block number in the group are subtracted by the subtracter 613 are also placed on the bus 823. The controller 803 controls the selectors 811, 813 to send the lower bits on the bus 823 (i.e., the tag low of the DBN ₂ , the index and the intra-block offset) to the adder 611, etc., and add it to '0', and in sequence 603 The shift amount shift in the shift domain in the DBN ₁ entry. The sum of the block number and the intra-block offset of the DBN ₂ is placed on the bus 881, and the result 631 is shifted out by the shifter 607. The control selector 618 selects from the entry of the DBN ₁ in the sequence table 603. Adjacent group number. If the group number is invalid, assign a new group to fill the DBN ₂ data block according to the example of FIG. 7 and the first stage, and set its valid bit and label index number corresponding to DBN ₂ , wherein the shift domain is DBN ₁ Shift domain settings. In the process, the adjacent group number of the original invalid group number in DBN ₁ will be filled in the group number of the newly assigned group and set to be valid, and read again. The group number of DBN1 will also be filled in the adjacent group number corresponding to DBN2. If it is valid, the group number is read directly. The group number is also placed on the bus 881 and the group block number on the bus 881, and the offset within the block is sent to the selector via the bus 816. The 617 is selected and placed as the buffer address of the DBN ₂ on the bus 855 and sent to the data storage 113. The data from the lower layer memory 115 is filled, and the correct data is read from the address and sent to the buffer 849 for use by the processor core 101. Controller 803 then notifies processor core 101 that the correct data is available for use.

The controller 803 controls the selectors 811, 813 to send the lower bits on the bus 823 (i.e., the tag low of the DBN ₂ , the index and the intra-block offset) to the adder 611 and the like, and add the data step size on the bus 825, and press The shift amount shift in the shift domain in the DBN ₂ entry in the sequence table 603. The sum of the new intra-group block number and the intra-block offset is placed on the bus 881, and the result 631 is shifted out by the shifter 607 to control the selector 618 to select an adjacent group number in the sequence table 603. If the adjacent group number is invalid, it indicates that the data block is not in the primary data storage 113, and at this time, a new data group is allocated according to the above example. The group number and the added intra-group block number and the intra-block offset together form a cache address when the data read command is executed next time, hereinafter referred to as DBN ₃ . The controller 803 controls the data step on the DBN ₃ and the bus 825 via the bus 881, and the selector 843 writes back the corresponding entry of the same data read command (previously stored DBN ₁ ) in the track table 619.

DBN by the controller ₃ corresponding to the address data memory 115 from the lower filled DBN 113 fetch the data block pointed to a data memory ₃ to prepare for the next read cycle the same instruction. Specifically, the group number, the intra-group block number, and the intra-block offset in the DBN ₃ are sent to the selector 617 via the bus 816 to be selected and directed to the data in the primary data memory 113. Meanwhile, in the sequence table 603 based on the read group number of the corresponding tag ₃ DBN index number (high) output by the bus 643, DBN in the block number ₃ and the block offset sent by bus 818 to select After the processor 810 is selected, the intra-group block number and the intra-block offset are shifted by the shift adder 812 and then added to the tag index number on the bus 643 to obtain the correct data address. The data address is selected by the selector 819 and sent to the lower layer memory 115 to fetch the data block in the primary data memory 113 pointed to by the DBN ₃ .

In the next cycle, when DBN ₃ is placed on bus 821, controller 803 determines that the corresponding data read command has entered the third stage based on the track table address match. The controller 803 controls the selector 617 to select the DBN ₃ on the bus 821 to read the corresponding data from the primary data memory 113 via the bus 855, and the buffer 849 is used by the processor core 101. The controller 803 also controls the corresponding data address of the DBN ₃ to be compared with the data address sent by the processor core 101 via 641, adds the DBN ₃ to the data step size to obtain the DBN ₄ , and queries the sequence table 603 according to the DBN ₄ , if necessary. The corresponding data is retrieved from the lower layer memory 115 and stored in the primary data memory 113, as in the previous example, to be in the next cycle. The subsequent loops are executed as such.

In addition, in some loops, the data read instruction has a negative data step size, that is, data with a larger data address is read from a certain data address, and then the data address is read one by one smaller than the previous one. In this case, the controller cannot determine in the first stage that the step size is negative, and the corresponding data of DBN ₁ is arranged on the data block No. 0 in a certain group. In the second stage, DBN ₂ and DBN ₁ are subtracted to obtain the data step size, and the data step size is found to be negative. In this case, DBN ₂ can be arranged in the highest data block of another data group, and the corresponding data address of DBN _{2 is set} . the high (from bus 641) bits of the set index of the tag is written, and writes the group number of the group where the group DBN adjacent group number ₁ in the previous group, the group number is written DBN ₁ Within the next set of bits in the adjacent group number of the group. Arranging the addressing rules in accordance with this embodiment in this way, the required data can be correctly found regardless of the address addressed by the data address or the cache address.

Another way is to not allocate a new group, but directly store DBN ₂ in the group where DBN ₁ is located to save more cache space. The method is to invert the block number of the group in the group, taking four data blocks in a group as an example. At this time, the block No. 0 stored in the original DBN ₁ is mapped to block No. 3, and the original block No. 3 is mapped to block No. 0, the original block No. 1 is mapped to block No. 2, and the original block No. 2 is mapped to block No. 1. The implementation manner is that an inverter is added to the route of the block number in the group, and the block number of the group output by the inverter is the block number of the group input by the inverter. To this end, an inverted (R) bit is added under the feature of the sequence table 603. When the R bit is '0', the inverter does not work and the output is the same as the input. When the R bit is '1', the inverter acts and its output is the bitwise negation of the input. In this way, the data originally stored in the group in descending order is stored in the group in ascending order. For example, DBN ₁ (by index should be 0) is now actually stored in block 0, but the cache address stored in the track table is marked as block 3; DBN ₂ (by index should be -1) is now actually stored Block 1, but the cache address stored in the track table is marked as block 2; DBN ₃ (by index should be -2) is now actually stored in block 2, but the cache address stored in the track table is marked as number 1 Block; DBN ₄ (by index should be -3) is now actually stored in block 3, but the cache address stored in the track table is marked as block 0. However, there is still a problem in that the tag index bit of the group is set by DBN ₁ being placed in block 0. Therefore, in the second stage, the step size is determined to be negative. When the DBN ₂ data block is filled, the R bit of the group is set to '1', and the label, index number field written by the group in the first stage is passed through the bus. 643 Read, subtract a constant and then write back the label, index number field. This constant can be obtained by looking up a table or calculation. Let a data group have n data blocks, and the shift field in the order table 603 to be adjusted is s (read at the same time as the label and index number and sent to the bus 829), then the constant is equal to (n) -1 ) * ( s+1 ). For example, in the above example, 4 data blocks with a shift value of '0', the constant is equal to '3'. DBN tag is, the index value ₁ (in this case corresponds to the address in the mapping block 3 DBN ₁₎ subtracting 3, precisely _{DBN. 4} (a case corresponding to map the block address is _{0. 4} DBN The tag index number value. As another example, the shift value is '1', and the constant is '6'. Others are deduced by analogy and will not be repeated.

Both the data address and the DBN stored in the track table use the correct address before the mapping, and only the cached address is sent to the primary data store. The mapped address is required only, so the above-mentioned inverter can be placed after the selector 617 in Fig. 8A and only the block numbers in the group are inverted. In the third stage of the embodiment, the DBN sent from the track table 619 When the data is fetched by the selector 617 to the primary data memory 113, the R bit is also read in the group number 623 to the sequence table 603 to control the inverter. As in the track table 619 The addition of the R bit in the data entry in the middle can eliminate the query for 603 at this time. However, in order to compare with the data address sent from the bus 641, the sequence table 603 must also be queried by the group number 623. The tag index number field corresponding to the DBN is obtained from the bus 643.

There may be any other suitable modifications in accordance with the technical solutions and concepts of the present invention. All such substitutions, modifications and improvements are intended to be within the scope of the appended claims.

Industrial applicability

The apparatus and method proposed by the present invention can be used in various data cache related applications, and the efficiency of the processor system can be improved.

Sequence table free content

Claims

A data caching method, characterized in that a data storage in a cache is configured, wherein a part of the storage blocks implements a traditional group associative structure, and another part of the storage blocks implements a structure allocated by groups;

The buffer allocated by the group is composed of a plurality of groups, each of which stores a plurality of data blocks corresponding to the same starting data block address, and the difference between the data addresses corresponding to the adjacent storage blocks in the group is the same value.
The method according to claim 1, wherein the data addresses corresponding to the data blocks in each group have the same portion;

The same portion is composed of a label in a data address, or a part of a label in a data address and a part of an index number; and

Data blocks with adjacent or similar addresses are stored in the same group.
The method according to claim 2, wherein when the difference between the data addresses corresponding to the respective adjacent storage blocks in a group is equal to the data block length, the data block addresses in all the storage blocks in the group are consecutive;

When the difference between the data addresses corresponding to each adjacent storage block in a group is equal to an integer multiple of the length of the data block, the interval of the data block addresses in all the storage blocks in the group is equal;

It is possible to directly determine whether the next data is also located in the group based on the current location and the data step size in the group, and the location when the next data is in the group.
The method of claim 3, wherein a sequence table is provided; the rows of the sequence table correspond one-to-one with the groups in the data store;

Each row of the sequence table includes a compression ratio; the compression ratio indicates an interval value of a data block address corresponding to each adjacent storage block in the corresponding group.
The method according to claim 4, wherein each row of said sequence table includes a location of a group in which a data block adjacent to a data block in the corresponding group is located; and

The location of the group in which the next data is located and the location in the group can be directly determined according to the corresponding location of the current data and the data step size in the group.
The method of claim 5 wherein each row of said sequence table includes a location of a group of consecutive data blocks adjacent to a first data block in the respective group.
The method of claim 5 wherein each row of said sequence table includes a location of a group of consecutive data blocks adjacent to a last data block in the respective group.
The method of claim 5 wherein the data address is converted to a cache address;

The cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the intra-block offset in the data address;

The cache address can be used directly to address the data store in the data cache.
The method according to claim 8, wherein the data corresponding to the data access instruction in the loop code is stored in the structure allocated by the group, and the data corresponding to the other data access instructions is stored in the group-associated structure.
The method of claim 9 wherein the data access instruction executed for the first time is converted to a cache address when its data address is generated.
The method according to claim 10, wherein the data access instruction executed for the second time is converted into a cache address when the data address is generated, and the data step size is calculated; the data step is twice. The difference between the data addresses; and

Calculating, according to the current cache address and the data step, the next possible cache address when the data access instruction is executed next time, for the next execution of the data access instruction is addressing the data memory;

When the data in the data memory corresponding to the next cache address is invalid, the next cache address is converted into a corresponding data address, and the corresponding data is filled into the data memory.
The method according to claim 11, wherein the data access instruction executed for the third time and later is calculated according to the current cache address and the data step size, and the next time the data access instruction is executed. Addressing the data memory; and

When the data in the data memory corresponding to the next cache address is invalid, the next cache address is converted into a corresponding data address, and the corresponding data is filled into the data memory.
A data caching system, wherein the data storage in the data caching system can operate a part of the storage blocks as a traditional group associative structure according to a configuration, and another part of the storage blocks to operate as a group-allocated structure;

The group-allocated structure includes a plurality of groups, each group comprising a plurality of memory blocks and a data block address storage unit, and all of the memory blocks in the group correspond to data block addresses in the data block address storage unit;

The difference between the data addresses corresponding to each adjacent storage block in each group is the same value.
The system of claim 13 further comprising a masked comparator for using a portion of the block address in the data address and a corresponding bit of the data block address in the data block address storage unit A match is made to determine if the data corresponding to the data address is stored in the group.
The system according to claim 14, wherein when the difference between the data addresses corresponding to the respective adjacent storage blocks in a group is equal to the data block length, the data block addresses in all the storage blocks in the group are consecutive;

When the data corresponding to the data address is stored in the group, the memory block in the group is addressed by the masked bit, and the data corresponding to the data address can be found.
The system of claim 14 further comprising a shifter; when the difference in data addresses corresponding to respective adjacent memory blocks in a group is equal to an integer multiple of the length of the data block, all of the storage in the group The data block addresses in the block are equally spaced; and

When the data corresponding to the data address is stored in the group, the value obtained by the shifter after the masked bit shift addresses the memory block in the group, and the The data corresponding to the data address.
The system of claim 14 further comprising a sequence table memory; the rows in said sequence table memory are in one-to-one correspondence with the groups in the data store;

Each row of the sequence table memory includes a storage unit for storing a compression ratio; the value stored in the storage unit indicates an interval value of a data block address corresponding to each adjacent storage block in the corresponding group.
The system according to claim 14, wherein each row of said sequence table memory includes a pointer to a location of a group in which a data block adjacent to a data block in the corresponding group is located; and

The location of the group in which the next data is located and the location in the group can be directly determined according to the corresponding location of the current data and the data step size in the group.
The system of claim 18 wherein said pointer points to a location of a group of consecutive plurality of data blocks adjacent to a first data block in the respective group.
The system of claim 18 wherein said pointer points to a location of a group of consecutive plurality of data blocks adjacent to a last data block in the respective group.
The system according to claim 18, wherein the data address and the data block address in the data block address storage unit are matched by said comparator, and the value is in the data address according to the value in the compression ratio storage unit by the shifter. The index number is shifted accordingly, and the data address can be converted into a cache address;

The cache address is composed of a group number, a block number within the group, and an offset within the block; wherein the offset within the block is the same as the intra-block offset in the data address;

The cache address can be used directly to address the data store in the data cache.
The system according to claim 18, wherein the data block address value in the data block address storage unit corresponding to the cache address is stored by the shifter according to the value in the compression ratio storage unit and the intra-group block in the cache address The number is shifted accordingly to convert the cache address to a data address.