WO2015067195A1 - 一种可重构缓存组织结构 - Google Patents

一种可重构缓存组织结构 Download PDF

Info

Publication number
WO2015067195A1
WO2015067195A1 PCT/CN2014/090481 CN2014090481W WO2015067195A1 WO 2015067195 A1 WO2015067195 A1 WO 2015067195A1 CN 2014090481 W CN2014090481 W CN 2014090481W WO 2015067195 A1 WO2015067195 A1 WO 2015067195A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
address
data
cache
memory
Prior art date
Application number
PCT/CN2014/090481
Other languages
English (en)
French (fr)
Inventor
林正浩
Original Assignee
上海芯豪微电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海芯豪微电子有限公司 filed Critical 上海芯豪微电子有限公司
Publication of WO2015067195A1 publication Critical patent/WO2015067195A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the field of computers, communications and integrated circuits.
  • Existing processor architectures generally use a cache to copy portions of the lower-level memory into it, allowing the content to be quickly accessed by higher-level memory or processor cores to keep the pipeline running.
  • the basic cache structure usually fills the contents of the low-level storage medium into the cache after the cache is missing, so that the pipeline has to pause to wait for the missing portion to be filled into the cache.
  • Some new cache structures such as victim cache, trace cache, and prefetch, are based on the basic cache structure described above and improve the above structure.
  • the current architecture especially the lack of multiple caches, has become the most serious bottleneck restricting the performance of modern processors.
  • the address of an instruction or data stored in the cache is generally divided into three parts: a label (TAG), an index number ( Index ) and the intra-block offset (offset ).
  • the cache is generally composed of a tag memory and a data memory, and the two correspond to each other.
  • Each of the data storage blocks stores a main memory block (ie, an instruction block or a data block), and the tag memory stores a tag of the corresponding memory block address.
  • caches usually come in three forms: direct mapping ( Direct map ), fully associative, and way-set associative.
  • each of the main memory blocks in main memory can only appear in a unique location in the cache.
  • direct mapping since each main memory block has only one location in the cache, the number of comparisons of the addresses is only one.
  • Direct mapping divides the main memory into pages, each page of the main memory having the same size as the cache and corresponding to one tag. Reading the corresponding label according to the index number in the address of the main memory block, and comparing with the label portion in the main memory block address, determining whether the storage block corresponding to the index number is corresponding to the main memory block address The main memory block.
  • the direct mapping cache structure is simple and easy to implement, but each index number only corresponds to one storage block. Therefore, two directly stored main memory blocks with the same index number cannot be simultaneously stored in the directly mapped cache.
  • a main memory block can be placed anywhere in the cache, ie, between memory blocks, and there is no direct relationship between the memory order or the saved memory address. Unrelated data blocks can be stored in the cache, and each main memory block and its own address must be stored. When requesting data, the cache controller must compare the main memory block address with all block addresses stored in the tag memory for confirmation.
  • Group association is a structure between fully associative and direct mapping. This structure divides the cache into several road groups (way-set) Direct mapping is implemented in the road group, and the full connection mode is adopted between the road group and the road group. This allows for a given number of block locations for a given index number, thus increasing the hit rate and system efficiency.
  • the fully associative cache requires a large number of comparators for comparing tags, the circuit structure is complicated and the speed is not high; and since all comparators are simultaneously compared, the power consumption is very large, in modern processors. Not available in bulk cache. Directly mapped caches are often inefficient because each index number corresponds to only one memory block. There can only be one INDEX
  • Group-associated caching is the most commonly used caching structure in modern processors. Generally, the more the number of path groups, the better the performance of the cache, but the more tags that need to be read and compared at the same time, the higher the power consumption, and the more complicated the selector for selecting data, resulting in greater delay; Sensitive amplifier and Y decoders occupy a large area, relying on the split bit line (bit line).
  • bit line bit line
  • Some programs due to their own characteristics, can achieve good performance by caching the number of lower path groups. Therefore, in a group-associated cache structure, it is difficult to make a balanced choice between cache performance and hardware cost because it is not known what program to execute.
  • each path group of the direct mapping cache structure and the group associative structure the index numbers of the respective storage blocks are consecutive, and each index number can only correspond to one storage block, which is not flexible enough.
  • the reconfigurable cache organization structure proposed by the present invention can directly solve one or more of the above or other difficulties.
  • the invention proposes a The reconfigurable cache organization structure is characterized in that a plurality of storage blocks in the cached instruction or data memory can form a group and correspond to the same label to form a group reconfigurable structure; when the address is addressed and When the secondary addressing addresses correspond to the same group, the label comparison of the addressed address can be omitted, and the corresponding instruction or data can be directly found in the group.
  • the storage blocks in the cache may be configured into groups of equal or unequal size.
  • the current addressing address and the last addressing address are: an address of the instruction whose two addresses are consecutive; or a data address corresponding to two consecutive data access instructions; or the same data access The data address corresponding to the instruction when it is executed.
  • the current addressing may be determined according to the difference between the current addressing address and the index number portion in the last addressed address, and the position of the instruction or data corresponding to the last addressed address in the cache. The location of the instruction or data corresponding to the address in the cache.
  • the corresponding instruction or data may be found in the cache according to the label in the addressed address, the matching result of the upper part of the index number, and the decoding result of the lower part of the index number.
  • the matching result of the label in the addressing address and the upper part of the index number may be sent to the decoder for decoding the lower part of the index number by using the selection line; only in the decoder corresponding to the matching success item The lower part of the index number is decoded.
  • the matching result on the selection line may be encoded and transmitted through the bus, and then decoded and sent to the corresponding decoder.
  • a plurality of storage blocks in the cache may be configured as a group associative structure to form a hybrid structure in which a group reconfigurable structure and a group associative structure coexist.
  • the cache organization structure may constitute a group associative cache structure; wherein each group (group The memory blocks can all be assigned to any one of the groups (sets) to form one of them; and the product of the maximum number of sets and the number of the largest ways is larger than the storage block group ( The number of groups).
  • all memory blocks share the same group of bit lines.
  • the cache organization structure including a two-dimensional table; the row corresponding group of the two-dimensional table (set And at least one row; the column of the two-dimensional table corresponds to a way (way) and includes at least one column; the contents of the two-dimensional table include: a label, a valid bit, and a group number.
  • all valid tag values in the corresponding row are read from the two-dimensional table according to an index number in the memory address, and the tags in the memory address are matched; according to the matching success item
  • the group number finds the memory address corresponding group in the cache ( Group ); accesses the corresponding instruction or data in the group according to the offset address in the memory address.
  • At least one independent tag module is further included; when an independent tag module is assigned to a group ( When set), the index number value and the tag value corresponding to the group (set) are stored in the independent tag module.
  • the cache organization structure when a new path needs to be assigned to a group (set) And if all the entries in the corresponding row in the two-dimensional table are valid, if there are still unoccupied independent tag modules, the independent tag module is assigned to the group (set) Use; if all independent tag modules are occupied, according to the replacement algorithm from part or all of the independent tag modules, and the corresponding path of the set in the two-dimensional table (way) One of them is determined for replacement.
  • the cache organization structure when addressing in the cache according to a memory address, reading all valid tags from corresponding rows of the two-dimensional table according to an index number in the memory address And reading a tag value from all the independent tag modules storing the index number to match the tag in the memory address; finding the memory address corresponding group in the cache according to the group number in the matching success item ( Group ); accesses the corresponding instruction or data in the group according to the offset address in the memory address.
  • the reconfigurable cache organization structure of the present invention can provide a variable size cache organization structure by group allocation.
  • the main memory addresses corresponding to the storage blocks in each group are contiguous, so that when the processor core acquires instructions or data of consecutive addresses, the cache only needs to undergo a simple calculation to directly determine that the instructions or data are in the cache. The position, thus avoiding tag matching and reducing power consumption.
  • the reconfigurable cache organization structure of the present invention can also be configured to be allocated by group as needed /
  • the group-associated cache structure stores the instructions or data of the consecutive addresses and the instructions or data of the discontinuous addresses in the cache portion allocated by the group and the cache portion of the associated structure of the group, so that the cache system is compatible with the existing one. Under the premise of the cache structure, it better supports the instruction or data reading of consecutive addresses.
  • the reconfigurable cache organization structure of the present invention provides a cache form between group association and full association, which can give different path groups to different index numbers according to the objective requirements of the program running time. Fully associative performance is achieved with hardware costs equivalent to group associations.
  • 8B is an embodiment of an address format and an entry format in a reconfigurable cache according to the present invention.
  • 8C is an embodiment of an operational state of the reconfigurable cache of the present invention.
  • 8D is another embodiment of an operational state of the reconfigurable cache of the present invention.
  • Figure 3 shows a preferred embodiment of the invention.
  • the invention provides a reconfigurable cache organization structure, which can be used to provide a cache of different number of way groups according to a configuration, and store instructions or data of consecutive addresses in the same road group for facilitating the processor.
  • the kernel fetches instructions or data to reduce the number of tag matches.
  • the data cache is taken as an example, but the structure and method of the present invention are also applicable to the instruction cache.
  • FIG. 1 is an embodiment of tag comparison in an existing group associative cache structure.
  • the figure shows the label comparison process for a road group. Wherein each line in the tag memory 101 (line Corresponding to an index number in which the label portion of the main memory block address is stored.
  • the index number 107 in the data addressed address on bus 105 is sent to decoder 111, and the tag 109 is sent to comparator 113.
  • the decoder 111 decodes the received index number 107 to obtain a corresponding word line (word line) ).
  • the word line is used to select a row in the tag memory 101 corresponding to the index number 107, and the tag stored therein is amplified by the sensitive amplifier 103 and sent to the comparator 113.
  • Comparator 113 The tag 115 and the tag 109 sent from the sensitive amplifier 103 are compared and the comparison result is output via the bus 117. If the comparison results are equal, indicating that in the way group, the label 115
  • the memory block corresponding to the row (memory block Stored in the data block where the data is located. If the comparison results are not equal, it means that the data block in which the data is located does not exist in the way group.
  • the decoded data from the corresponding word line is read from the data memory 121, and is the data corresponding to the data addressed address, and amplified by the sensitive amplifier 123 via the bus 119. Sent to the processor core.
  • the reconfigurable cache of the present invention replaces the label comparison process of the embodiment of Fig. 1 with a new structure and method.
  • Figure 2 It is an embodiment of tag comparison in a reconfigurable cache of the present invention.
  • the tag memory and the data memory in each path group are divided into corresponding groups (group ), each group corresponds to the same number of consecutive index numbers, and corresponds to the same label. That is, several data blocks corresponding to consecutive addresses of the same tag are stored in each group.
  • the tag memory 201 is divided into two groups, each group containing one line of content addressed memory (CAM) ), that is, a tag (such as tag 203 and tag 205) is stored.
  • the data memory 211 It is also divided into two groups, each group containing four storage blocks, and the data block addresses in the four storage blocks are consecutive and correspond to the same label.
  • the group 213 includes the storage blocks 221, 223, 225 and 227, the data block addresses in the four storage blocks are consecutive, and both correspond to the label 203;
  • the group 215 includes the storage blocks 231, 233, 235, and 237
  • the data block addresses in the four memory blocks are consecutive and correspond to the label 205.
  • each set of tags and corresponding sets of memory blocks also correspond to a register comparator and a decoder.
  • the tag 203 corresponds to the register comparator 217 and the decoder 229.
  • tag 205 corresponds to register comparator 219 and decoder 239 .
  • the register comparator includes a register and a comparator. Wherein, the register stores a high-order portion of an index number in a start address of the group of memory blocks.
  • the tag portion of the data addressed address is sent to the tag memory 201 All of the content addressing memories are matched, and all successfully matched content addressed memories output an enable signal to the registers in the corresponding registered comparators.
  • the comparator sends the externally sent data addressed address to the upper portion of the index number via the bus when the enable signal is valid 243
  • the index number upper part value stored in the register is compared to perform a partial (ie, the index number upper part) matching of the corresponding group data block address.
  • the decoder is in the case of registering the comparator output matching successfully, on the bus 245
  • the lower part of the index number in the upper data addressing address is decoded, and an output is selected from the corresponding group of data blocks according to the decoding result.
  • the slave data memory can be accessed.
  • the data block whose index number is the same as the index number in the data addressed address is read out. If all the content-addressable memories are unsuccessful, or all of the comparators participating in the comparison are unsuccessful, the data corresponding to the data-addressed address is not yet stored in the cached way group. In this way, in the same way, all the path groups are operated in parallel as above, and the desired data can be found in the buffer through the sensitive amplifier. 123 is amplified and output by bus 247, or the result of the cache miss.
  • the comparison line of the content addressed memory Due to the comparison line of the content addressed memory (Match Line).
  • the charging must be performed first in order to perform tag matching, and the charging and matching processes consume more power.
  • the matching tag power consumption is large in all content addressing storage lines at the same time. Therefore, the order of address matching can also be improved to further reduce power consumption.
  • the upper part of the index number in the data addressing address is first passed through the bus. 243 is sent to all registered comparators to compare with the stored high value of the index number, and according to the comparison result, only the comparison line of the content-addressable storage line corresponding to the matching success item is charged, and the bus 241 is charged.
  • the sent tags match and the enable address is output to the decoder by the successfully addressed content addressed memory line. Subsequent operations are the same as described above. This reduces the number of matches in the content-addressed memory rows and reduces power consumption.
  • the cache can be reconstructed only by storing the corresponding index number high value in the register in the register comparator.
  • consecutive index number high value values may be stored in two adjacent register comparators such that the index numbers corresponding to the two register comparators are also continuous.
  • the adjacent two groups are merged into one larger group to accommodate the data blocks of consecutive addresses.
  • FIG. 3 is another embodiment of tag comparison in a reconfigurable cache according to the present invention.
  • each of the tag memories 301 in this embodiment has not only one row of content addressed memory but also random access memory (RAM).
  • the tag memory 301 The content addressing memory in each group corresponds to the first memory block in the corresponding group in the data memory 211, and the other row random access memories respectively correspond to other memory blocks in the corresponding group, so that the data memory 211 Each memory block in it has a corresponding label.
  • the decoder e.g., decoders 329 and 339
  • the decoder can decode the complete index number as needed, or the decoder in the embodiment of Fig. 2. As with 229 and 239, only the lower bits of the index number are decoded.
  • the reconfigurable cache can be configured as a traditional group of associative caches as needed, or a group-configurable cache according to the present invention.
  • the storage blocks in each way group correspond to consecutive index numbers, but do not necessarily correspond to the same label.
  • the content addressed memory in the tag memory is regarded as a random access memory, and together with other random access memories, respectively stores the tags of the data blocks in the corresponding memory block.
  • each registered comparator (such as a register comparator)
  • the output of 217, 219) is configured as a fixed output '1'.
  • the index number in the data addressed address is matched, decoded, and addressed by the decoder, it can be in the tag memory 301 and the data memory.
  • the label and the storage block corresponding to the index number are respectively found in 211.
  • the tag is amplified by the sense amplifier 307 and compared with the tag in the data addressed address in the comparator 113 and subsequent processes and Figure 1 The same is true in the embodiment to determine whether the data block in which the data is located has been stored in the cache, and to read out the corresponding data block when it has been stored in the cache.
  • FIG. 4 is an embodiment of a group in a reconfigurable cache according to the present invention.
  • the index number has 4 Bit, where both the upper and lower parts are 2 bits.
  • Figure 4 shows a group in Figure 3 (such as label 205). The specific structure of the corresponding group).
  • the tag memory portion includes a content addressed storage row 401 (i.e., the tag 205 is stored) and three random access storage rows 403, 405, and 407. .
  • the data memory portion stores four memory blocks 411, 413, 415, and 417, respectively. Decoders 429 and 439 The upper part and the lower part of the index number are respectively decoded, and the functions of the decoder 239 in the embodiment of Fig. 3 are implemented together.
  • switches 421, 423, 425, and 427 All are in the form shown, that is, the output of each AND gate in decoder 409 is connected to the corresponding tag memory row.
  • the output of the comparator 219 is fixed to '1'. ', the result is sent to all AND gates in this group as an input.
  • the upper portion of the index number is passed through bus 451 by decoder 429
  • a group is selected (i.e., a valid signal is output to all AND gates in the group), and the lower portion is passed through the bus 453 by the decoder 439.
  • an AND gate in the group is selected, that is, a valid signal ('1') is given to the AND gate, and an invalid signal ('0') is given to all other AND gates.
  • decoder 429 selects Figure 4 The set is shown, and the decoder 439 selects the AND gate 433.
  • the tag in the random access memory row 403 is read out and passed through the sense amplifier 307. After amplification, it is compared with the label in the data addressed address in the comparator 113, and the corresponding data block is read from the storage block 413 and sent to the selection/amplification module 419.
  • the path groups are equal, it indicates that the data block read in the storage block 413 is the data block corresponding to the data addressed address, according to the bus 441.
  • the intra-block offset in the data addressing address is selected to corresponding data amplification and from the bus 445 Output. If the local groups are not equal, but the labels output by some other way groups are equal, then the data block output by that way group is the data block corresponding to the data addressing address, according to the intra-block offset in the address addressed address. The quantity can be selected as the corresponding data. If all the way groups are not equal, it indicates that the data block corresponding to the data addressing address has not been stored in the cache.
  • switches 421, 423, 425, and 427 Both are grounded, that is, the output of each AND gate is disconnected from the corresponding tag memory row, and all tag memory rows are not selected; and all outputs of decoder 429 are forced to be '1' '.
  • the upper portion of the index number in the data addressed address is sent to the register comparator 219 via the bus 243. Matching is performed, and the content addressing storage line corresponding to the matching success item is charged, so that the content addresses the label stored in the storage line and by the bit line 455 The tag of the data addressed address is matched.
  • the tag match is unsuccessful, the data corresponding to the data addressed address is not included in the group. If the tag matches successfully, the valid signal is output 443 Go to each and every door in this group. At this time, the output of the decoder 429 is fixed to '1', and the lower portion of the index number is passed through the bus 453 by the decoder 439. After decoding, select an AND gate. It is assumed that in this case, both the comparator 219 and the decoder 429 select the group shown in FIG. 4, and the decoder 439 selects the AND gate 435, then the AND gate 435. Under the control of the output signal, the data block in the read block 415 is sent to the select/amplify module 419.
  • Select / Amplify Module 419 directly based on the intra-block offset in the address addressed address 441 Directly selecting the corresponding data from the data block and by bus 445 Output. During this process, each random access storage row in the tag memory does not participate in the operation.
  • the present embodiment is described by taking data reading as an example, for data storage, it can also be implemented in a similar manner. The only difference is the data cache when reading data.
  • the bit line 457 in the matching memory block in 211 is read and amplified to output the corresponding data, and when the data is stored, the data (or data block) to be stored is directly written into the data buffer via the bit line 457. 211 in the matching memory block.
  • the decoders in the buffer can be changed to only point to data memory 211.
  • FIG. 5 is another embodiment of label comparison in a reconfigurable cache according to the present invention.
  • the decoder 529 is added. And 539.
  • the structure and function of the decoder 529 are the same as those of the decoder 329, both of which receive the comparison result sent by the register comparator 217; the structure and function of the decoder 539 and the decoder The same is true for 339, both of which receive the comparison result sent by the register comparator 219.
  • the decoder 329 and the corresponding decoder 529 decodes the index number in the data addressed address at the same time, and reads out the data block and the slave tag memory 201 from the data memory 201 according to the word line of the decoded output, respectively.
  • the tag is read out for subsequent operations.
  • Decoder 529 when the cache is configured to reconfigure the cache by group Inoperative, matching is performed only by the corresponding register comparator and the content addressed memory row as described above to determine which group the data is in, and the match result is sent over the bus to the corresponding decoder (i.e., register comparator 217).
  • the matching result of the content addressed storage line 203 is sent to the decoder 329 via the select line 517; and the bus 345 is based on the decoder 329.
  • the lower part of the index number in the upper data addressing instruction, the decoded word line finds the corresponding data block in the group for subsequent operations.
  • Other decoders (such as decoders 339 and 539) It also operates in the same manner, for example, the matching result of the register comparator 219 and the content addressed memory row 205 is sent to the decoder via the select line 527. Wait.
  • the matching result may be encoded and then transmitted to the data memory side through the bus, and then decoded and then sent to the corresponding Decoder to reduce selection lines (such as select lines The number of 517, 527). For example, 16 selection lines can be replaced by 4 buses.
  • the embodiment is similar in structure to the embodiment of FIG. 3, except that the tag memory 301 is added.
  • Each of the contents addresses the bit width of the memory instead of registering the comparator.
  • the increased portion of the content addressed memory stores the corresponding upper portion of the index number.
  • the cache when the cache is configured as a group associative cache, the upper portion of the index number in the data addressed address passes through the bit line 243.
  • the newly added portion of all the content addressing memories sent to the tag memory 481 is matched, and the decoder corresponding to the matching success term decodes the lower portion of the index number, thereby being in the tag memory 481.
  • the tag and the storage block corresponding to the index number are respectively found in the data memory 211.
  • each content address memory is matched, and the lower portion of the index number is decoded by a decoder corresponding to the matching success term.
  • a decoder corresponding to the matching success term.
  • FIG. 7 is an embodiment of a reconfigurable cache configuration according to the present invention.
  • the cache of this embodiment has four road groups (such as road groups 501, 503, 505, and 507), and each path is divided into four groups (such as road groups). Groups 511, 513, 515, and 517 in 501, each group can contain several storage blocks for storing data blocks. Therefore, you can store the most in all 16 groups. A data block of consecutive labels of 16 labels.
  • the group number and the number within the road group can be used together to form a group number uniquely identifying each group in the tag memory (and the data memory).
  • Figure 7 The number on each group of the medium tag memory shows that the first two digits are the road group number and the last two digits are the group numbers in the road group. That is, the road group number of the road group 501 is '00', and the road group number of the road group 503 is '01', the road group The road group number of the 505 is '10', and the road group number of the road group 507 is '11'.
  • the number of each group in each road group from top to bottom is '00', '01', '10 'and' 11 '.
  • the group number corresponding to group 511 is '0000'
  • the group number corresponding to group 513 is '0001'
  • the group number corresponding to group 515 is '0010'.
  • the corresponding group number of group 517 is ' 0011 ', and so on.
  • the difference between the address of the two preceding data addresses and the address of the previous data address may be used.
  • the group number directly derives the group number corresponding to the current data addressing address to avoid tag matching.
  • the group address corresponding to the data addressing address of the previous data read instruction is '1110 ', that is, the data corresponding to the data addressing address is located in group 545 Corresponding storage block.
  • the tag and index number in the data addressing address of the current data read instruction are respectively subtracted from the tag and index number in the previous data addressed address. If the result of label subtraction is '0 ', which means that the data addressing address is the same as the label of the last data addressing address, so it is located in the same group 545, that is, the group number corresponding to the current data addressing address is also '1110 '.
  • the positional relationship between the memory block corresponding to the current data addressing address and the memory block corresponding to the previous data addressing address can be determined. Specifically, if the index number is subtracted, the result is '0.
  • the current data addressing address corresponding storage block is the storage block corresponding to the last data addressing address, and the corresponding data can be selected from the storage block according to the intra-block offset in the data addressing address; If the result of subtracting the index number is positive, then the corresponding address of the data addressing address is located after the storage block corresponding to the last data addressing address; if the result of subtracting the index number is negative, the current data addressing address corresponds to The memory block is located before the memory block corresponding to the last data addressed address. For the latter two cases, the absolute value of the difference obtained by subtracting the index numbers is the separation distance of the two storage blocks.
  • the difference between the index numbers is ' 2 ', indicating that the memory block corresponding to the current data addressed address is located in the second memory block after the memory block corresponding to the last data addressed address.
  • the difference between the index numbers is ' -1 ', indicating that the memory block corresponding to the current data addressing address is located in the first memory block before the memory block corresponding to the last data addressed address. In this way, the position of the data block corresponding to the data addressed address in the cache can be determined without any label comparison.
  • these data blocks can also be stored in two groups in which the group numbers are consecutive.
  • the group numbers are consecutive.
  • the group 513 and 515 correspond to the same label, and the upper part of the index number corresponding to the group 515 is equal to the upper part of the index number corresponding to the group 513 plus '1', not only the groups 513 and 515
  • the addresses of the data blocks stored in each are consecutive, and the last data block in group 513 and the address of the first data block in group 515 are also contiguous.
  • connection relationship between the groups is formed, that is, the group 513 Is the leader of group 515, and group 515 is the success of group 513 (next Groups, so that data locations located in another group can be found directly from one group based on the index number difference.
  • the current data addressing address can be matched in the tag memory using the method described in the previous embodiment to find the corresponding group.
  • each group can also be configured to have different sizes, for example, the road group 501 in the embodiment of Fig. 7 can be used. Configured into four groups (ie, groups 511, 513, 515, and 517), and configures way group 503 into one group, and sets road groups 505 and 507 Configured in a traditional form of group associative structure.
  • the road group 501 contains up to four different labels
  • the road group 503 contains only one type of label.
  • Road group 505 and 507 the maximum number of labels that can be included is equal to the number of corresponding storage blocks (and the number of rows of the road group itself), and adjacent storage blocks can correspond to different labels.
  • the cache thus configured, a large amount of data with consecutive data addressing addresses (ie, the same label) can be stored in the road group according to the characteristics of the program.
  • a small amount of data in which a plurality of sets of data addressing addresses are consecutive is stored in each group of the way group 501.
  • the cache has the flexibility of storing data in the cache and is easy to replace, and can save a large number of label comparison operations when performing data access of consecutive addresses.
  • the label part in the cache can also be improved, and the reconfigurable cache with variable path group number can be realized.
  • the reconfigurable cache is composed of a tag portion 801 and a data storage portion 803, wherein all the storage blocks in the data storage portion 803
  • the same group of bit lines are shared, and according to the grouping method of the data memory 211 in the previous embodiment, it is divided into six groups (group): A, B, C, D, E and F , so the group number is 3 digits, for example: group A corresponds to group number ' 000 ', group B corresponds to group number ' 001 ', ..., group F corresponds to group number ' 101 '.
  • Each group ( Group ) contains multiple memory blocks, each of which contains multiple instructions or multiple data.
  • each group is stored in the form of a two-dimensional table 825 (group ) Corresponding label, group number and other information.
  • the entry 807 can include a tag 811, a valid bit 817, and a group number 819.
  • the effective bit 817 is '1 ' indicates that the entry is a valid entry; a valid bit 817 of '0' indicates that the entry is an invalid entry.
  • the number of columns of the two-dimensional table 825 corresponds to the maximum number of ways supported by the cache, and the number of rows represents the same way (way The maximum number of supported groups in the ), which is the maximum supported index number value in the same way.
  • the two-dimensional table 825 has 4 rows and 4 columns, of which 4 columns indicate that the cache supports up to 4 ways (way ); 4 lines indicate that the cache can be divided into 4 groups (set), the index number ranges from '0' to '3', that is, the index number has 2 digits.
  • the index number in the memory address and the group in the cache ( Set ) one-to-one correspondence.
  • the comparison module 821 is composed of a plurality of comparators, and the number of comparators is equal to the number of columns of the two-dimensional table 825. Comparison module 821 All valid tags read in the tag portion 801 according to the index number 813 are compared with the tag 811 in the memory address 805, and the comparison result is sent to the selection module 823. As a control signal.
  • the selection module 823 is composed of a plurality of transmission gates, and the number of transmission gates is equal to the number of columns of the two-dimensional table 825. The input of each transmission gate is the group number in the content of the entry outputted by the corresponding two-dimensional table column. . In this way, according to the comparison result, the group number 819 corresponding to the matching success entry can be selected.
  • memory address 805 is divided into three sections: label 811, index number 813, and offset address. 815.
  • the index number 813 corresponds to the number of rows of the two-dimensional table 825, that is, the number of bits is 2 bits; and the offset address 815 corresponds to the instruction or data in the group (group The position within ) has a fixed number of digits. For example, suppose each group (group) contains 8 memory blocks, each of which contains 16 bytes of instructions or data. For a 32-bit long memory address, the offset address 815 A total of 7 bits (that is, a total of 128 bytes of instructions or data for each group of 8 memory blocks); index number 813 is 2 bits, and the remaining 23 bits are labels.
  • the number of label storage locations corresponding to each index number is the same, that is, equal to the road (way ) number.
  • index number M and index number N correspond to 4 Labels.
  • the number of label storage locations corresponding to each index number may be different.
  • the index number M It can only correspond to 2 labels, that is, 2 ways (2 way); but the index number N corresponds to 4 labels, that is, 4 ways (4 way).
  • the two-dimensional table 825 The corresponding row in the table has no valid entries and therefore does not match.
  • the comparison module 821 outputs a cache miss signal to the bus 829. For example, if the index number value is '11', the cache allocates an available group F to the index number' 11 '. That is, the tag and the group number F in the memory address 805 are stored in an invalid entry in the second row of the two-dimensional table 825, and the entry is made valid. At this time, the two-dimensional table 825 The state in Figure 8C is shown.
  • the index number value in the memory address 805 is '00'
  • the corresponding 0th All the contents of the entry in the line are read. Since these entries are valid, these tags are sent to comparator module 821 for comparison with tag 811 in the memory address, and all comparators are compared by logic gates. 827 Output after 'or' operation.
  • the group number (B, C, D or E) in the read content is sent to the selection module 823 as an input to the transmission gate.
  • the comparison module The 821 outputs a cache hit signal to bus 829
  • the selection module 823 outputs the group number 819 output corresponding to the matching success.
  • the group number 819 and the offset address 815 Together, the cache addressing address 809 is formed, and then the corresponding address or data can be accessed directly from the data storage portion 803 using the addressed address 809.
  • the compare module 821 If the match is unsuccessful, the compare module 821 outputs a cache miss signal to the bus 829.
  • the replacement algorithm such as LRU algorithm
  • the replacement algorithm from the corresponding 4 groups (group): B, C, D, E Select a suitable group (group) assigned to the memory address 805 to use.
  • the corresponding second The contents of all the entries in the row are read out, and the valid bits and tags are sent to the comparator module 821, and the group number is sent to the selection module 823.
  • the comparison module 821 outputs a cache hit signal to the bus 829, and the module 823 is selected. Output the group number A corresponding to this entry.
  • the group number together with the offset address 815 constitutes a cache addressing address 809, which can then be directly accessed from the data storage portion 803 using the addressed address 809. Access the corresponding instruction or data.
  • the compare module 821 outputs a cache miss signal to the bus 829. Due to index number ' Only one of the 10 'corresponding paths (way) is occupied, so it is possible to select an appropriate group from all six groups (groups) according to a replacement algorithm (such as the LRU algorithm). Group ) is assigned to this memory address 805 for use. At this time, if the selected group (group) obtained is not group A (for example, group C), then the group (group) The original corresponding index number (for example, index number '00') is reduced by 1 way (way), and the index number '10' is increased by 1 way (way) ). In this way, different numbers of paths can be automatically given to different index numbers according to program requirements during the running of the program, and the cache resources can be flexibly allocated, thereby improving the cache hit ratio.
  • a replacement algorithm such as the LRU algorithm
  • FIG. 8D It is another operational state embodiment of the reconfigurable cache of the present invention.
  • the cache has been assigned a group (group) for each index number, which is D, A, respectively. , B, C; Groups E and F have not been assigned.
  • group for each index number, which is D, A, respectively. , B, C; Groups E and F have not been assigned.
  • the corresponding group number can be output by the selection module 823 according to the method described in the previous embodiment. If the match is unsuccessful, an available group (group) is assigned to the index number according to the replacement algorithm. .
  • groups E and F are assigned to the index numbers '01' and '10', so that the state of the two-dimensional table 825 is as shown in Fig. 8D. Shown.
  • the specific operation process in this embodiment is similar to that in the embodiment of FIG. 8C, and details are not described herein again.
  • the traditional cache has its address label portion symmetrically and correspondingly to the data storage portion. Its structure is fixed. For example, there are a total of 8 storage block groups (groups), which can be organized in groups of 1 group, 8 groups, 2 channels, 4 groups, 4 channels, 2 groups, and 8 channels. Group (all-in-one); but its organization is fixed and cannot be changed after designation.
  • the address tag portion of the cache shown in this embodiment is asymmetric with the data storage portion, and has a variable mapping relationship, and thus its structure is not fixed.
  • the address space of the address tag portion is larger than the space that the cache memory block group can occupy.
  • data storage part 803 Fixed in the traditional cache is a 1-way 6 set (6 set) memory.
  • the six storage block groups (group ) can be mapped to the label section 801 as needed.
  • 4 channels 4 groups ( 4 way , 4 set ) total 16 Among the address locations, a plurality of way group buffers constituting different combinations are formed. In this embodiment, a set can be converted into a way.
  • the storage block group (group) in the two different data storage parts can be mapped to different groups in the same way ( set ) ). That is, the way is converted to a set. In this way, the set and the way can be exchanged, so this can be regarded as a new type of road-group swap cache ( Way-set exchange cache ). If the two-dimensional table 825 is further added 2 columns, and in the comparison module 821 and the selection module 823 By adding corresponding comparators and selectors, you can implement 6-way 1 set (6 way, 1 set) full associative cache as needed.
  • the reconfigurable cache can also be implemented with a more flexible structure.
  • FIG 9 It is another embodiment of the reconfigurable cache of the present invention.
  • the data storage portion in this embodiment is the same as that of the embodiment of Fig. 8A, and also includes six groups.
  • the comparator 921 and the selector 923 in the tag portion of the embodiment are respectively compared with the comparator and selection module 823 in the comparison module 821 in the embodiment of Fig. 8A.
  • the selectors in the same are the same.
  • the label portion of the embodiment further includes a plurality of independent label modules of the same structure (such as the independent label module 905 in FIG. 9 , 907 and 909).
  • the independent tag module 905 is taken as an example, and includes an index number register 911, a tag register 913, and a comparator 915. , transfer gate 917 and selector 919.
  • the content stored in the tag register 913 is the same as the content of the table of the two-dimensional table 825, including the tag, the valid bit, and the group number; the selector 919
  • the input is derived from the output word line of the decoder 901; the index number register 911 stores the index number corresponding to the independent tag module, and the index number is used as the selector 919
  • the control signal selects the corresponding word line.
  • the output of the selector 919 is '1' only when the input index number value of the decoder 901 is equal to the index number value stored in the index number register 911. ', otherwise the output is '0'.
  • This output is used to control whether the tag register 913 outputs the tag stored therein to the comparator 915 and the tag in the memory address 805 811 Compare. Specifically, if the selector 919 outputs '1', it indicates that the independent tag module 905 corresponds to the index number 813 in the memory address 805, in the tag register 913. In the case where the medium valid bit is '1', the stored tag is sent to the comparator 915 for comparison with the tag 811 in the memory address 805, otherwise no comparison is made.
  • Comparator 915 The result of the comparison is sent to bus 829 for 'or' operation with the comparison of the output of comparator 921 and other independent tag modules to obtain a result of a cache hit or miss.
  • Transmission gate 917 and selector 923 The transfer gate is similar in that its input is the group number in the tag register 913, and in the case where the comparator 915 outputs the comparison result as a match, the group number 819 is output, and the offset address in the memory address 805 is output. 815 together constitutes the cache addressing address 809.
  • the independent tag modules can also be configured to allocate the independent tag modules to corresponding memory addresses corresponding to the index numbers for better flexibility.
  • the tag column 903 When the label column 903 The entry corresponding to an index number is already valid, and an instruction or data access based on the new memory address corresponding to the index number occurs. If there is an unoccupied independent tag module, the tag column 903 may not be present. The entry in the entry is replaced, but the independent tag module is used to store the tag in the new address to increase the way (way )Effect. If all of the individual tag modules are also occupied, one of these independent tag modules can be assigned for replacement according to the replacement algorithm.
  • the independent tag modules can be utilized, and the path groups are allocated in real time as the program runs, so that some index numbers can correspond to more than one path ( Way ), while other index numbers only correspond to one way or no way to use the cache more reasonably effectively and improve performance.
  • the apparatus and method proposed by the present invention can be used in a variety of cache related applications to increase the efficiency of the processor system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种可重构缓存组织结构,应用于处理器领域时能根据配置提供不同数目路组结构的缓存,并将连续地址的指令或数据存储在同一个路组中,便于处理器核获取指令或数据,以减少标签匹配次数。

Description

一种可重构缓存组织结构 技术领域
本发明涉及计算机,通讯及集成电路领域。
背景技术
现有的处理器体系结构普遍采用缓存将更低级存储器中的部分内容复制在其中,使这些内容能被更高级存储器或处理器核快速存取,以保证流水线的持续运行。基本的缓存结构通常在缓存缺失之后,才将低级存储媒介中的内容填充到缓存中,使得流水线不得不暂停以等待缺失部分被填充到缓存中。一些新的缓存结构,如:牺牲缓存、跟踪缓存以及预取等都是基于上述基本缓存结构并改善上述结构。然而,随着日渐扩大的处理器 / 存储器速度鸿沟,现行体系结构,特别是多种缓存缺失,已成为是制约现代处理器性能提升的最严重瓶颈。
为了提高性能,在现代处理器系统中通常采用缓存暂存主存储器中的指令或数据。存储在缓存中的指令或数据的地址一般分为三个部分:标签( TAG )、索引号( index )和块内偏移量( offset )。缓存一般由标签存储器和数据存储器构成,两者一一对应。其中数据存储器的每一个存储块存储了一个主存块(即指令块或数据块),标签存储器则存储了对应的存储块地址的标签。按组织形式的不同,缓存通常有三种形式:直接映射( direct map )、全相联( fully associative )和组相联 (way-set associative) 。
在直接映射的缓存结构里,主存中的每个主存块在高速缓存中只能出现在唯一位置上。在直接映射的情况下,由于每个主存块在高速缓存中仅存在一个位置,因而地址的比较次数仅为一次。直接映射把主存储器分成若干页,主存储器的每一页与高速缓存的容量大小相同,且对应一个标签。根据主存块的地址中的索引号读出对应的标签,并与主存块地址中的标签部分比较,即可确定该索引号对应的存储块中存储的是否就是所述主存块地址对应的主存块。直接映射的缓存结构简单,实现容易,但每个索引号只对应一个存储块,因此直接映射的缓存中无法同时存储两个具有相同索引号的主存块。
在全相联的缓存结构里,一个主存块可以被放到高速缓存中的任何一个地方,即存储块之间,以及存储顺序或保存的存储器地址之间没有直接的关系。高速缓存中可以保存互不相关的数据块,且必须对每个主存块及其自身的地址加以存储。当请求数据时,缓存控制器必须把主存块地址同所有存储在标签存储器中的块地址加以比较,进行确认。
组相联是介于全相联和直接映射之间的一种结构。这种结构将高速缓存分成若干个路组 (way-set) ,在路组内实现直接映射,而路组与路组间采用全相联方式。这样对于某一个给定的索引号,可以允许有几个块位置,因而可以增加命中率和系统效率。
技术问题
在上述三种组织结构中,全相联的缓存需要大量的比较器用于比较标签,电路结构复杂导致速度不高;且因所有比较器均同时进行比较,功耗非常大,在现代处理器的大容量缓存中无法实现。直接映射的缓存则因为每个索引号只对应一个存储块的局限,导致频繁替换,效率很低。只能有一个 INDEX
组相联的缓存则是现代处理器中最常用的缓存结构。通常路组数越多,缓存的性能越好,但需要同时读出、比较的标签也越多,导致功耗越高,且用于选择数据的选择器也越复杂,导致时延越大;敏感放大器和 Y 译码器所占的面积都很大,单靠分割位线( bit line )增加路组数代价很高。此外,有些程序由于其本身的特点,较低路组数的缓存就能实现不错的性能。因此,在组相联的缓存结构中,由于不知道会执行什么样的程序,因此很难在缓存性能与硬件成本之间作出平衡的选择。
最后,在直接映射缓存结构及组相联结构的每个路组中,各个存储块的索引号是连续的,且每个索引号只能对应一个存储块,不够灵活。
技术解决方案
本发明提出的 可重构缓存组织结构能直接解决上述或其他的一个或多个困难。
本发明提出了一种 可重构缓存组织结构,其特征在于,缓存的指令或数据存储器中的多个存储块可以构成一个组,且对应同一个标签,形成按组可重构结构;当本次寻址地址与上次寻址地址对应同一个组时,可以省略本次寻址地址的标签比较,直接在该组中找到对应的指令或数据。
可选的,可以将所述缓存中的存储块配置为大小相等或不等的组。
可选的,所述本次寻址地址和上次寻址地址是:两条地址连续的指令本身的地址;或两条被先后执行的数据访问指令分别对应的数据地址;或同一条数据访问指令先后被执行时分别对应的数据地址。
可选的,当进行连续地址或相近地址寻址,且地址中的标签部分相同时,即可判定本次寻址地址与上次寻址地址对应同一个组。
可选的,可以根据所述本次寻址地址与上次寻址地址中的索引号部分的差值,以及上次寻址地址对应的指令或数据在缓存中的位置,确定本次寻址地址对应的指令或数据在缓存中的位置。
可选的,可以根据寻址地址中的标签、索引号高位部分的匹配结果,以及索引号低位部分的译码结果在缓存中找到相应指令或数据。
可选的,可以通过选择线将寻址地址中的标签、索引号高位部分的匹配结果送往用于索引号低位部分译码的译码器;仅在匹配成功项对应的译码器中对索引号低位部分进行译码。
可选的,可以先将选择线上的匹配结果编码后通过总线传输,经解码后再送到对应的译码器。
可选的,还可以将所述缓存中的若干存储块配置为组相联结构,形成按组可重构结构及组相联结构共存的混合结构。
可选的,所述缓存组织结构可以构成组相联缓存结构;其中每组( group )存储块均可以被分配给任意一个组( set )构成其的一个路( way );且最大组( set )数目与最大路( way )数目的乘积大于存储块组( group )的数目。
可选的,在所述缓存组织结构中,所有存储块共享同一组位线。
可选的,在所述缓存组织结构中,包括一个二维表;所述二维表的行对应组( set ),且至少包含一行;所述二维表的列对应路( way ),且至少包含一列; 所述二维表的表项内容包括:标签、有效位和组号。
可选的,在所述缓存组织结构中,根据存储器地址中的索引号从所述二维表中读出相应行中所有的有效标签值与存储器地址中的标签进行匹配;根据匹配成功项中的组号在缓存中找到该存储器地址对应组( group );根据存储器地址中的偏移地址在所述组( group )中访问相应指令或数据。
可选的,在所述缓存组织结构中,还包括至少一个独立标签模块;当一个独立标签模块被分配给一个组( set )时,在所述独立标签模块中存储了所述组( set )对应的索引号值和标签值。
可选的,在所述缓存组织结构中,当需要对一个组( set )分配新的路( way ),且所述二维表中该组( set )对应行中的全部表项均有效时,若尚有未被占用的独立标签模块存在,在将该独立标签模块分配给该组( set )使用;若所有独立标签模块均被占用,则根据替换算法从部分或全部独立标签模块,以及所述二维表中该组( set )对应的路( way )中确定一个用于替换。
可选的,在所述缓存组织结构中,当根据一个存储器地址在所述缓存中寻址时,根据所述存储器地址中的索引号从所述二维表的相应行中读出所有有效标签值,并从存储了所述索引号的所有独立标签模块中读出标签值与所述存储器地址中的标签进行匹配;根据匹配成功项中的组号在缓存中找到该存储器地址对应组( group );根据存储器地址中的偏移地址在所述组( group )中访问相应指令或数据。
对于本领域专业人士,还可以在本发明的说明、权利要求和附图的启发下,理解、领会本发明所包含其他方面内容。
有益效果
本发明所述的可重构缓存组织结构可以提供一种可变大小的按组分配的缓存组织结构。每组中的存储块对应的主存地址是连续的,使得处理器核在获取连续地址的指令或数据时,所述缓存只需要经过简单的计算即可直接确定所述指令或数据在缓存中的位置,从而避免了标签匹配,降低了功耗。
本发明所述的可重构缓存组织结构还可以按需配置为按组分配 / 组相联混合的缓存结构,将连续地址的指令或数据,以及不连续地址的指令或数据分别存储在按组分配的缓存部分及组相联结构的缓存部分中,使得缓存系统在兼容现有缓存结构的前提下,更好地支持连续地址的指令或数据读取。
本发明所述的可重构缓存组织结构提供了一种介于组相联和全相联之间的缓存形式,可以根据程序运行时的客观需求,对不同索引号给予不同的路组数,用相当于组相联的硬件代价,实现了全相联的性能。
对于本领域专业人士而言,本发明的其他优点和应用是显见的。
附图说明
图 1 是现有组相联缓存结构中标签比较的一个实施例;
图 2 是本发明所述可重构缓存中标签比较的一个实施例;
图 3 是本发明所述可重构缓存中标签比较的另一个实施例;
图 4 是本发明所述可重构缓存中一个组的实施例;
图 5 是本发明所述可重构缓存中标签比较的另一个实施例;
图 6 是本发明所述可重构缓存中标签比较的另一个实施例;
图 7 是本发明所述可重构缓存配置情况的一个实施例;
图 8A 是本发明所述可重构缓存的实施例;
图 8B 是本发明所述可重构缓存中地址格式和表项格式的一个实施例;
图 8C 是本发明所述可重构缓存的一个运行状态实施例;
图 8D 是本发明所述可重构缓存的另一个运行状态实施例;
图 9 是本发明所述可重构缓存的另一个实施例。
本发明的最佳实施方式
图 3 显示了本发明的最佳实施方式。
本发明的实施方式
以下结合附图和具体实施例对本发明提出的高性能缓存系统和方法作进一步详细说明。根据下面说明和权利要求书,本发明的优点和特征将更清楚。需说明的是,附图均采用非常简化的形式且均使用非精准的比例,仅用以方便、明晰地辅助说明本发明实施例的目的。
需要说明的是,为了清楚地说明本发明的内容,本发明特举多个实施例以进一步阐释本发明的不同实现方式,其中,该多个实施例是列举式并非穷举式。此外,为了说明的简洁,前实施例中已提及的内容往往在后实施例中予以省略,因此,后实施例中未提及的内容可相应参考前实施例。
虽然该发明可以以多种形式的修改和替换来扩展,说明书中也列出了一些具体的实施图例并进行详细阐述。应当理解的是,发明者的出发点不是将该发明限于所阐述的特定实施例,正相反,发明者的出发点在于保护所有基于由本权利声明定义的精神或范围内进行的改进、等效转换和修改。同样的元器件号码可能被用于所有附图以代表相同的或类似的部分。
本发明提供了一种可重构缓存组织结构,应用于处理器领域时能根据配置提供不同数目路组结构的缓存,并将连续地址的指令或数据存储在同一个路组中,便于处理器核获取指令或数据,以减少标签匹配次数。在本说明书中,以数据缓存为例进行说明,但本发明所述结构和方法也适用于指令缓存。
请参考图 1 ,其为现有组相联缓存结构中标签比较的一个实施例。图中显示了一个路组的标签比较过程。其中,标签存储器 101 中的每行( line )对应一个索引号,其中存储了主存块地址中的标签部分。当需要从缓存中读取数据时,总线 105 上的数据寻址地址中的索引号 107 被送往译码器 111 ,而标签 109 则被送往比较器 113 。
译码器 111 对接收到的索引号 107 进行译码,得到相应的字线( word line )。所述字线被用于选中标签存储器 101 中与索引号 107 对应的行,并输出其中存储的标签经敏感放大器 103 放大后送往比较器 113 。比较器 113 比较敏感放大器 103 送来的标签 115 和标签 109 并通过总线 117 输出比较结果。若比较结果相等,说明在该路组中,标签 115 所在行对应的存储块( memory block )中存储的就是所述数据所在的数据块。若比较结果不相等,则表示在该路组中不存在所述数据所在的数据块。通过对缓存中所有路组进行如上操作,即可确定所述数据所在的数据块是否已经存储在缓存中,以及当已经存储在缓存中时的位置。这样,当比较结果为相等时,根据索引号 107 译码得到的相应字线从数据存储器 121 中读出的数据就是所述数据寻址地址对应的数据,并经敏感放大器 123 放大后经总线 119 送往处理器核。
本发明所述的可重构缓存用新的结构和方法替代了图 1 实施例中的标签比较过程。请参考图 2 ,其为本发明所述可重构缓存中标签比较的一个实施例。在本实施例中,每个路组中的标签存储器和数据存储器均被分为相应的若干组( group ),每组对应连续索引号的相同数目行,且对应同一个标签。即,每个组中存储了对应同一个标签的连续地址的若干个数据块。
以一个路组为例,如图 2 所示,标签存储器 201 被分为两个组,每组含一行内容寻址存储器( CAM ),即存储一个标签(如标签 203 和标签 205 )。相应地,数据存储器 211 也被分为两个组,每个组含四个存储块,且这四个存储块中的数据块地址连续,并对应同一个标签。具体地,组 213 中包含存储块 221 、 223 、 225 和 227 ,这四个存储块中的数据块地址连续,且均对应标签 203 ;组 215 中包含存储块 231 、 233 、 235 和 237 ,这四个存储块中的数据块地址连续,且均对应标签 205 。需要说明的是,在本发明中,并不要求组 213 中四个数据块和组 215 中四个数据块的起始地址一致。在本实施例中,每组标签及相应组存储块还对应一个寄存比较器和一个译码器。如标签 203 对应寄存比较器 217 和译码器 229 ,标签 205 对应寄存比较器 219 和译码器 239 。所述寄存比较器中包含一个寄存器和一个比较器。其中,所述寄存器存储了该组存储块起始地址中的索引号的高位部分。
所述数据寻址地址中的标签部分被送到标签存储器 201 中的所有内容寻址存储器进行匹配,所有匹配成功的内容寻址存储器均向对应的寄存比较器中的寄存器输出一个使能信号。所述比较器则在该使能信号有效的情况下将外部送来的数据寻址地址中索引号的高位部分经总线 243 与所述寄存器中存储的索引号高位部分值进行比较,从而对相应组数据块地址进行部分(即所述索引号高位部分)匹配。所述译码器则在寄存比较器输出匹配成功的情况下,对总线 245 上的数据寻址地址中索引号的低位部分进行译码,并根据译码结果从相应组数据块中选出一个输出。这样,通过寄存比较器和译码器的匹配、译码及寻址,即可从数据存储器 211 中读出索引号与数据寻址地址中索引号相同的数据块。若所有内容寻址存储器均匹配不成功,或所有参与比较的比较器均匹配不成功时,则说明所述数据寻址地址对应的数据尚未存储在缓存的该路组中。这样,按同样方法并行对所有路组进行如上操作,即可在缓存中找到所需数据经敏感放大器 123 放大后由总线 247 输出,或得到缓存缺失的结果。
由于内容寻址存储器的比较线( Match Line )必须先充电,才能进行标签匹配,而充电、匹配过程耗电较多,在本实施例中同时在所有内容寻址存储行中匹配标签功耗较大。因此,还可以改进地址匹配的顺序,以进一步降低功耗。具体地,先将所述数据寻址地址中的索引号高位部分通过总线 243 送到所有寄存比较器中与存储的索引号高位部分值进行比较,并根据比较结果,只对匹配成功项对应的内容寻址存储行的比较线进行充电、与经总线 241 送来的标签匹配,并由匹配成功的内容寻址存储行向译码器输出使能信号。后续操作与前述相同。这样,减少了内容寻址存储行的匹配数目,降低了功耗。
在本实施例中,只需要对寄存比较器中的寄存器存入相应的索引号高位值,即可对缓存进行重构。例如,可以在两个相邻的寄存比较器中存入连续的索引号高位值,使得这两个寄存比较器对应的索引号也连续。这样,所述相邻的两个组就被合并为一个更大的组以容纳连续地址的数据块。
请参考图 3 ,其为本发明所述可重构缓存中标签比较的另一个实施例。同样以一个路组为例,其结构与图 2 实施例中的类似,不同之处在于本实施例中的标签存储器 301 的每组不但有一行内容寻址存储器,还包含了随机访问存储器( RAM )。其中,标签存储器 301 每一组中的内容寻址存储器与数据存储器 211 中相应组内的第一个存储块对应,而其他的各行随机访问存储器分别与相应组内的其他存储块对应,使得数据存储器 211 中的每个存储块都有对应的标签。此外,在本实施例中,译码器(如译码器 329 和 339 )可以根据需要,对完整的索引号进行译码,或如图 2 实施例中的译码器 229 和 239 那样,只对索引号低位进行译码。
在本实施例中,所述可重构缓存就可以按需被配置为传统形式的组相联缓存,或本发明所述的按组可重构缓存。当配置为组相联缓存时,每个路组中的存储块对应连续的索引号,但不必对应相同的标签。此时,标签存储器中的内容寻址存储器被视为随机访问存储器,和其他随机访问存储器一起分别存储了对应的存储块中数据块的标签。同时,每个寄存比较器(如寄存比较器 217 、 219 )的输出被配置为固定输出' 1 '。这样,数据寻址地址中的索引号经译码器的匹配、译码和寻址后,就可以在标签存储器 301 和数据存储器 211 中分别找到该索引号对应的标签和存储块。所述标签经敏感放大器 307 放大后在比较器 113 中与数据寻址地址中的标签比较及后续过程与图 1 实施例中相同,从而确定所述数据所在的数据块是否已经存储在缓存中,以及当已经存储在缓存中时读出对应数据块。
当配置为按组可重构缓存时,每个组内的随机访问存储器被忽略,每个寄存比较器中的寄存器都存放了索引号高位部分,而译码器只对索引号的低位部分进行译码,这样本实施例所示结构就具备了和图 2 实施例的的相同功能,即可按图 2 实施例所述方法进行操作。
请参考图 4 ,其为本发明所述可重构缓存中一个组的实施例。在本实施例中,假设索引号共有 4 位,其中高位部分和低位部分均为 2 位。
图 4 中所显示的就是图 3 中一个组(如标签 205 对应的组)的具体结构。其中,标签存储器部分包含一个内容寻址存储行 401 (即存储了标签 205 )和三个随机访问存储行 403 、 405 和 407 。数据存储器部分则相应地存储了四个存储块 411 、 413 、 415 和 417 。译码器 429 和 439 分别对索引号的高位部分和低位部分进行译码,两者一同实现了图 3 实施例中的译码器 239 的功能。
在本实施例中,当该组缓存被配置为组相联缓存时,开关 421 、 423 、 425 和 427 均呈如图所示形式,即将译码器 409 中各个与门的输出连通到相应的标签存储器行中。此时,如图 3 实施例所述,比较器 219 的输出固定为' 1 ',其结果被送到本组中的所有与门作为一个输入。索引号的高位部分经总线 451 由译码器 429 译码后选中一个组(即向该组中的所有与门输出一个有效信号),而低位部分则经经总线 453 由译码器 439 译码后选中该组中的一个与门,即给予该与门一个有效信号(' 1 '),并给予其他所有与门一个无效信号(' 0 ')。假设在此情况下,译码器 429 选中图 4 所示的组,且译码器 439 选中与门 433 ,则在与门 433 输出信号的控制下,随机访问存储行 403 中的标签被读出并经敏感放大器 307 放大后在比较器 113 中与数据寻址地址中的标签进行比较,并从存储块 413 中读出对应的数据块送往选择 / 放大模块 419 。若本路组比较相等,则说明存储块 413 中读出的数据块就是该数据寻址地址对应的数据块,根据总线 441 上的数据寻址地址中的块内偏移量即可选出对应的数据放大并从总线 445 输出。若本路组比较不相等,但其他某个路组输出的标签比较相等,则那个路组输出的数据块就是该数据寻址地址对应的数据块,根据数据寻址地址中的块内偏移量即可选出对应的数据。若所有路组比较都不相等,则说明该数据寻址地址对应的数据块尚未存储在缓存中。
当该组缓存被配置为按组可重构缓存时,开关 421 、 423 、 425 和 427 均接地,即将各个与门的输出与相应的标签存储器行断开,且对所有标签存储器行均不做选取;并强制译码器 429 的所有输出固定为' 1 '。此时,如之前实施例所述,数据寻址地址中的索引号高位部分经总线 243 被送到寄存比较器 219 进行匹配,并对匹配成功项对应的内容寻址存储行充电,使该内容寻址存储行中存储的标签与由位线 455 送来的数据寻址地址的标签进行匹配。若标签匹配不成功,则本组中没有包含所述数据寻址地址对应的数据。若标签匹配成功,则输出有效信号 443 到本组中的各个与门。此时,译码器 429 的输出固定为' 1 ',且索引号的低位部分经经总线 453 由译码器 439 译码后选中一个与门。假设在此情况下,比较器 219 及译码器 429 均选中图 4 所示的组,且译码器 439 选中与门 435 ,则在与门 435 输出信号的控制下,读出存储块 415 中的数据块送往选择 / 放大模块 419 。选择 / 放大模块 419 则直接根据数据寻址地址中的块内偏移量 441 ,直接从该数据块中选出对应的数据并由总线 445 输出。在此过程中,标签存储器内各个随机访问存储行均没有参与操作。虽然本实施例以数据读取为例进行说明,但对于数据存储而言,也可以用类似方法实现。不同之处仅在于进行数据读取时,从数据缓存 211 中匹配的存储块中经位线 457 读出数据块经放大选择后输出相应数据,而进行数据存储时,直接将需要存储的数据(或数据块)经位线 457 写入数据缓存 211 中匹配的存储块中。
此外,还可以将所述缓存中的译码器(如译码器 329 和 339 )改为只指向数据存储器 211 中的存储块,并增加一套译码器用于指向标签存储器 201 中的内容寻址存储行。这样,如果标签存储器 201 和数据存储器 211 物理位置相隔较远时,依然可以实现本发明所述缓存的功能。
请参考图 5 ,其为本发明所述可重构缓存中标签比较的另一个实施例。在本实施例中,增加了译码器 529 和 539 。其中,译码器 529 的结构和功能与译码器 329 相同,两者均接收寄存比较器 217 送出的比较结果;译码器 539 的结构和功能与译码器 339 相同,两者均接收寄存比较器 219 送出的比较结果。这样就可以不必如图 4 实施例那样将译码器 329 和 339 输出的字线同时送往数据存储器 211 ,并经开关(如开关 421 )送往标签存储器 201 。以译码器 329 及对应的译码器 529 为例,当所述缓存被配置为组相联缓存时,译码器 329 和译码器 529 同时对数据寻址地址中的索引号译码,并分别根据译码输出的字线从数据存储器 201 读出数据块及从标签存储器 201 中读出标签,以进行后续操作。当所述缓存被配置为按组可重构缓存时,译码器 529 不工作,按前述方法仅由相应的寄存比较器及内容寻址存储行进行匹配,以确定数据在哪个组中,并将匹配结果通过总线送往相应译码器(即,寄存比较器 217 及内容寻址存储行 203 的匹配结果经选择线 517 送往译码器 329 );并根据译码器 329 对总线 345 上的数据寻址指令中索引号的低位部分,译码得到的字线在该组中找到对应的数据块,以进行后续操作。其他的译码器(如译码器 339 和 539 )也按相同方法运行,例如寄存比较器 219 及内容寻址存储行 205 的匹配结果经选择线 527 送往译码器 339 等。当因缓存被分为很多组,导致所述传递到译码器的匹配结果很多时,可以先将所述匹配结果进行编码后通过总线传输到数据存储器一侧,经解码后再分别送往对应的译码器,从而减少选择线(如选择线 517 、 527 )的数目。例如, 16 条选择线可以由 4 条总线代替。
需要说明的时,在图 2 、图 3 和图 4 实施例中,采用寄存比较器的结构实现了对索引号高位部分的匹配。当然也可以用内容寻址存储器代替所述寄存比较器实现同样功能。对于图 2 实施例,只需要增加标签存储器 201 中的每个内容寻址存储器的位宽,使得标签和索引号高位部分都能被存储在所述内容寻址存储器并参与匹配,即可实现所述功能。对于图 3 和图 4 实施例,则在增加标签存储器 301 中的每个内容寻址存储器的位宽的同时,保持随机访问存储器不变即可。请参考图 6 ,其为本发明所述可重构缓存中标签比较的另一个实施例。仍然以一个路组为例,本实施例与图 3 实施例结构相似,不同之处在于增加了标签存储器 301 中的每个内容寻址存储器的位宽,以代替寄存比较器。内容寻址存储器增加的部分中存储了就是相应的索引号高位部分。
在本实施例中,当缓存被配置为组相联缓存时,数据寻址地址中的索引号的高位部分经位线 243 被送到标签存储器 481 中所有内容寻址存储器新增加的部分进行匹配,而匹配成功项对应的译码器则对索引号的低位部分进行译码,从而在标签存储器 481 和数据存储器 211 中分别找到该索引号对应的标签和存储块。后续操作过程与图 3 施例中相同,在此不再赘述。当缓存被配置为按组可重构缓存时,每个组内的随机访问存储器被忽略,数据寻址地址中的标签和索引号高位部分被送到标签存储器 481 各个内容寻址存储器匹配,并由匹配成功项对应的译码器对索引号低位部分译码。后续操作过程与图 3 施例中相同,在此不再赘述。这样,采用本发明所述缓存结构,可以在不增加敏感放大器和 Y 译码器的情况下,很方便地将一个路组中的存储块分割为多个组,每个组起到了一个路组的作用。
请参考图 7 ,其为本发明所述可重构缓存配置情况的一个实施例。为了便于说明,在图 7 中只显示了标签存储器和译码器,省略了其他部分。本实施例的缓存具有四个路组(如路组 501 、 503 、 505 和 507 ),每路则被分为四个组(如路组 501 中的组 511 、 513 、 515 和 517 ),每组中则可以包含若干个用于存放数据块的存储块。因此,在所有 16 个组中就可以最多存放对应 16 个标签的连续地址的数据块。
根据本发明技术方案,可以用路组号及路组内号码一同构成一个对标签存储器(及数据存储器)中的每个组进行唯一识别的组号码。如图 7 中标签存储器每组上的数字所示,前两位为路组号,后两位为路组内的组号。即,路组 501 的路组号为' 00 ',路组 503 的路组号为' 01 ',路组 505 的路组号为' 10 ',路组 507 的路组号为' 11 '。而每个路组中的各个组从上往下的路组内号码依次为' 00 '、' 01 '、' 10 '和' 11 '。因此,组 511 对应的组号码就是' 0000 ',组 513 对应的组号码就是' 0001 ',组 515 对应的组号码就是' 0010 ',组 517 对应的组号码就是' 0011 ',以此类推。这样,当一个数据寻址地址在某组中匹配成功时,就可以用相应的组号码及组内偏移量(用以指明组内的哪个数据块以及该数据块中的哪个数据)代替该数据寻址地址,使得下一次重复访问该数据时不必再次进行标签匹配。
进一步地,当程序中的数据读取指令(或数据存储指令)依次访问相邻或相近位置的数据时,可以根据前后两个数据寻址地址的差值,及前一个数据寻址地址对应的组号码,直接推算出本次数据寻址地址对应的组号码,以避免标签匹配。
例如,假设前一次数据读取指令的数据寻址地址对应的组号码是' 1110 ',即所述数据寻址地址对应的数据位于组 545 对应的存储块中。将本次数据读取指令的数据寻址地址中的标签和索引号分别减去所述前一次数据寻址地址中的标签和索引号。若标签相减的结果为' 0 ',即表示本次数据寻址地址与上次数据寻址地址的标签相同,因此位于同一个组 545 内,即本次数据寻址地址对应的组号码也是' 1110 '。根据索引号相减的结果则可以确定本次数据寻址地址对应存储块与上次数据寻址地址对应的存储块之间的位置关系。具体地,若索引号相减的结果为' 0 ',则本次数据寻址地址对应存储块就是上次数据寻址地址对应的存储块,并可根据数据寻址地址中的块内偏移量从该存储块中选出对应的数据;若索引号相减的结果为正,则本次数据寻址地址对应存储块位于上次数据寻址地址对应的存储块之后;若索引号相减的结果为负,则本次数据寻址地址对应存储块位于上次数据寻址地址对应的存储块之前。对于后两种情况,索引号相减得到的差值的绝对值就是所述两个存储块的间隔距离。例如,索引号相减得到的差值为' 2 ',表示本次数据寻址地址对应的存储块位于上次数据寻址地址对应的存储块之后的第二个存储块。又如,索引号相减得到的差值为' -1 ',表示本次数据寻址地址对应的存储块位于上次数据寻址地址对应的存储块之前的第一个存储块。这样,不需要经过任何标签比较,就可以确定本次数据寻址地址对应的数据块在缓存中的位置。
此外,当连续的数据块超过单个组的容量时,还可以将这些数据块存储在组号码连续的两个组中。例如,当组 513 和 515 对应同样的标签,且组 515 对应的索引号高位部分等于组 513 对应的索引号高位部分加' 1 '时,不但组 513 和 515 中各自存储的数据块的地址是连续的,组 513 中最后一个数据块和组 515 中第一个数据块的地址也是连续的。这样,构成了组与组之间的连接关系,即组 513 是组 515 的前导( previous )组,而组 515 是组 513 的后续( next )组,从而能够根据索引号差值直接从一个组找到位于另一个组中的数据位置。
当然,若标签相减的结果不为' 0 ',则可以使用之前实施例所述方法,在标签存储器中对本次数据寻址地址进行匹配,以找到对应的组。
此外,在本发明中,还可以将各个组配置成不同大小,例如可以将图 7 实施例中的路组 501 配置成四个组(即组 511 、 513 、 515 和 517 ),并将路组 503 配置成一个组,以及将路组 505 和 507 配置成传统形式的组相联结构。在这种情况下,路组 501 中最多包含四个不同的标签,而路组 503 中只包含一种标签。路组 505 和 507 则如现有二路组缓存那样,各自可包含的最多标签数等于对应的存储块的数量(及路组本身的行数),相邻的存储块可以对应不同的标签。采用如此配置的缓存,可以根据程序的特点,将数据寻址地址连续(即标签相同)的大量数据存储在路组 503 中,并将多组数据寻址地址连续的少量数据存储在路组 501 的各个组中。对于数据寻址地址不连续的数据,则被存储在路组 505 或 507 中。这样,所述缓存即具备了缓存中数据存放的灵活性和便于替换的特点,又能在进行连续地址的数据访问时省去大量的标签比较操作。
以上的实施例均针对数据缓存进行说明。由于指令缓存的结构与数据缓存相似,因此完全可以用同样的结构实现,在此不再赘述。
根据本发明技术方案,还可以对缓存中的标签部分进行改进,实现路组数可变的可重构缓存。请参考图 8A ,其为本发明所述可重构缓存的实施例。在本实施例中,可重构缓存由标签部分 801 和数据存储部分 803 组成,其中数据存储部分 803 中的所有存储块 共享同一组位线( bit line ),并按之前实施例中的数据存储器 211 的分组方法,被分为 6 个组( group ): A 、 B 、 C 、 D 、 E 和 F ,因此组号为 3 位,例如:组 A 对应组号' 000 '、组 B 对应组号' 001 '、……、组 F 对应组号' 101 '。每个组( group )中包含多个存储块,而每个存储块中包含多条指令或多个数据。标签部分 801 中则以一个二维表 825 的形式存储了各个组( group )对应的标签、组号等信息。如图 8B 所示,表项内容 807 可以包含标签 811 、有效位 817 和组号 819 。在本发明中,有效位 817 为' 1 '表示该表项是有效表项;有效位 817 为' 0 '表示该表项是无效表项。此外,二维表 825 的列数对应所述缓存所支持的最大路数,行数则表示同一路( way )中最大支持的组数,即同一路( way )中最大支持的索引号值。在此,二维表 825 有 4 行 4 列,其中 4 列表示所述缓存最多支持 4 个路( way ); 4 行则表示缓存可以分为 4 个组( set ),索引号的范围为' 0 '到' 3 ',即索引号的位数为 2 位。存储器地址中的索引号与缓存中的组( set )一一对应。
此外,比较模块 821 由多个比较器构成,比较器的数目与二维表 825 的列数相等。比较模块 821 对标签部分 801 中根据索引号 813 读出的所有有效标签与存储器地址 805 中的标签 811 进行比较,比较结果则被送到选择模块 823 作为控制信号。选择模块 823 由多个传输门组成,传输门的数目与二维表 825 的列数相等。每个传输门的输入为对应的二维表列输出的表项内容中的组号 819 。这样,根据所述比较结果,即可选出匹配成功表项对应的组号 819 。
如图 8B 所示,存储器地址 805 被分为三个部分:标签 811 、索引号 813 和偏移地址 815 。其中,索引号 813 对应二维表 825 的行数,即位数为 2 位;而偏移地址 815 对应指令或数据在组( group )内的位置,其位数固定。例如,假设每个组( group )包含 8 个存储块,每个存储块包含 16 个字节指令或数据,对于 32 位长的存储器地址,偏移地址 815 共 7 位(即每组 8 个存储块共 128 字节指令或数据);索引号 813 为 2 位,其余 23 位为标签。
这样,根据所述索引号 813 读出所述二维表中对应行中存储的所有有效表项,并将这些表项中存储的标签与存储器地址 805 中的标签 811 同时比较。若均不匹配,则表示缓存缺失。若有匹配,则匹配项中存储的组号就是所述存储器地址 805 对应的组( group )。即,将存储器地址 805 中的标签 811 和索引号 813 转换为组号 819 ,从而由组号 819 和偏移地址 815 构成缓存寻址地址 809 。根据寻址地址 809 中的组号 819 即可在缓存中找到相应的组( group ),并根据寻址地址 809 中的偏移地址 815 从该组( group )中访问相应指令或数据。
需要注意的是,现有的组相联缓存中,每个索引号对应的标签存储位置的数目相同,即等于路( way )数。例如,在一个 4 路( way )缓存中,索引号 M 和索引号 N 都对应 4 个标签。但在本实施例的缓存结构中,每个索引号对应的标签存储位置的数目可以不同。例如,在本实施例所述的最多支持 4 路( way )的缓存中,索引号 M 可以只对应 2 个标签,即 2 路( 2 way );但索引号 N 对应 4 个标签,即 4 路( 4 way )。
以下结合图 8A ,以具体实施例进行说明。请参考图 8C ,其为本发明所述可重构缓存的一个运行状态实施例。在本实施例中只显示了标签部分 801 中存储的标签值。在本实施例中,假设在某一时刻,所述缓存已经为索引号' 00 '分配了四个组( group ): B 、 C 、 D 、 E ,并为索引号' 10 '分配了组 A ;组 F 尚未被分配。当根据存储器地址 805 寻找对应指令或数据时,根据其中索引号 813 读出二维表 825 中相应行的所有表项内容并将其中的标签和有效位送往比较器模块 821 。其中,接收到的有效位为' 0 '的比较器的比较输出均为' 0 '(即不匹配);接收到的有效位为' 1 '的比较器对接收到的标签与存储器地址中的标签 811 进行比较,并输出比较结果。
具体地,若存储器地址 805 中的索引号值是' 01 '或' 11 ',则二维表 825 中的对应行没有有效表项,因此均不匹配,比较模块 821 输出缓存缺失信号到总线 829 上。以该索引号值是' 11 '为例,缓存分配一个可用组 F 给索引号' 11 '。即,将该存储器地址 805 中的标签及组号 F 存储到二维表 825 第 2 行中的一个无效表项中,并将该表项设为有效。此时,二维表 825 中的状态如图 8C 所示。
之后,若存储器地址 805 中的索引号值是' 00 ',则对应的第 0 行中的所有表项内容均被读出。由于这些表项均有效,因此这些标签被送往比较器模块 821 分别与存储器地址中的标签 811 进行比较,所有比较器的比较结果由逻辑门 827 进行'或'操作后输出。同时读出内容中的组号( B 、 C 、 D 或 E )则被送往选择模块 823 作为传输门的输入。这样,若匹配成功,则比较模块 821 输出缓存命中信号到总线 829 上,且选择模块 823 输出匹配成功项对应的组号 819 输出。所述组号 819 与偏移地址 815 一同构成缓存寻址地址 809 ,之后利用该寻址地址 809 即可直接从数据存储部分 803 中访问相应指令或数据。
若匹配均不成功,则比较模块 821 输出缓存缺失信号到总线 829 上。由于索引号' 00 '对应的 4 个路( way )均已被占用,因此可以根据替换算法(如 LRU 算法),从对应的 4 个组( group ): B 、 C 、 D 、 E 中选择一个合适的组( group )分配给该存储器地址 805 使用。
若存储器地址 805 中的索引号值是' 10 ',则对应的第 2 行中的所有表项内容被读出,其中的有效位和标签被送往比较器模块 821 ,组号则被送往选择模块 823 作为相应传输门的输入。由于只有一个表项有效,若该表项中的标签匹配成功,则比较模块 821 输出缓存命中信号到总线 829 上,且选择模块 823 输出该表项对应的组号 A 。所述组号与偏移地址 815 一同构成缓存寻址地址 809 ,之后利用该寻址地址 809 即可直接从数据存储部分 803 中访问相应指令或数据。
若该表项中的标签匹配不成功,则比较模块 821 输出缓存缺失信号到总线 829 上。由于索引号' 10 '对应的路( way )中只有 1 个被占用,因此可以根据替换算法(如 LRU 算法),从所有 6 个组( group )中选择一个合适的组( group )分配给该存储器地址 805 使用。此时,若选择得到的被替换组( group )不是组 A (例如是组 C ),则该组( group )原对应的索引号(例如索引号' 00 ')相应减少 1 路( way ),而索引号' 10 '相应增加 1 路( way )。这样,可以在程序运行过程中根据程序需求,自动地对不同索引号给予不同数目的路( way ),灵活分配缓存资源,从而提高缓存命中率。
请参考图 8D ,其为本发明所述可重构缓存的另一个运行状态实施例。在本实施例中,假设在某一时刻,所述缓存已经为每个索引号均分配了一个组( group ),分别是 D 、 A 、 B 、 C ;组 E 、 F 尚未被分配。这样,当根据存储器地址 805 寻找对应指令或数据时,如果索引号 813 对应的标签匹配成功,则可按之前实施例所述方法由选择模块 823 输出对应的组号。若匹配不成功,则根据替换算法分配一个可用组( group )给该索引号 813 。例如,根据程序运行需求,先后分配了组 E 和 F 给索引号' 01 '和' 10 '使用,使得二维表 825 的状态如图 8D 所示。本实施例中的具体运行过程与图 8C 实施例中类似,在此不再赘述。
传统的缓存其地址标签部分与数据存储部分对称( Symmetrical )且对应, 其结构是固定的。如:一共有 8 个存储块组( group ),其组织方式可以是 1 路 8 组, 2 路 4 组, 4 路 2 组,以及 8 路 1 组(全相联);但是其组织形式是固定的,在设计时选定后即不可更改。
本实施例所示的缓存其地址标签部分与数据存储部分不对称,成可变的映射关系,因而其结构不是固定的。其地址标签部分的地址空间大于缓存存储块组能占据的空间。如:数据存储部分 803 在传统缓存中固定是一个 1 路 6 组( 1 way , 6 set )的存储器。在本实施例中,该 6 个存储块组( group )可以按需要被映射到标签部分 801 中 4 路 4 组( 4 way , 4 set )共 16 个地址位置中,构成不同组合的多种路组缓存器。在本实施例中,组( set )可以被转换成路( way )。另一方面,如果有两个数据存储部分 803 ,在传统缓存内为 2 路( way ),那么分别在两个不同数据存储部分中的存储块组( group )可以被映射到同一路( way )中的不同组( set )。即,路( way )被转换为组( set )。这样,组( set )和路( way )可以相互交换,因此可以将这视为一种新型的路 - 组交换缓存( way-set exchange cache )。如果二维表 825 再增加 2 列,并在比较模块 821 和选择模块 823 中增加相应的比较器和选择器,则可以按需实现 6 路 1 组( 6 way , 1 set )的全相联缓存。
此外,还可以用更灵活的结构实现所述可重构缓存。请参考图 9 ,其为本发明所述可重构缓存的另一个实施例。本实施例中的数据存储部分与图 8A 实施例的相同,也包含 6 个组( group )。为了便于说明,在图 9 中只显示了标签部分。本实施例所述标签部分中的比较器 921 和选择器 923 分别与图 8A 实施例中比较模块 821 中的比较器及选择模块 823 中的选择器相同。图 8A 实施例中的二维表 825 包含译码器 901 和 4 个标签列,每列有 4 个表项,而本实施例中则只包含译码器 901 和 1 个标签列 903 ,且该标签列 903 中的每个表项对应一个索引号。因此,输入的索引号 813 先经译码器 901 译码得到字线( word line )后,再根据该字线选择标签列 903 中的相应表项的内容读出。此外,本实施例的标签部分还包括若干结构相同的独立标签模块(如图 9 中的独立标签模块 905 、 907 和 909 )。
以独立标签模块 905 为例,包含索引号寄存器 911 、标签寄存器 913 、比较器 915 、传输门 917 和选择器 919 。其中,标签寄存器 913 中存储的内容与二维表 825 的表项内容相同,包含标签、有效位和组号;选择器 919 的输入来源于译码器 901 的输出字线;索引号寄存器 911 存储了该独立标签模块对应的索引号,该索引号被用做选择器 919 的控制信号,对相应字线进行选择。这样,只有当译码器 901 的输入索引号值等于索引号寄存器 911 存储的索引号值时,选择器 919 的输出为' 1 ',否则输出为' 0 '。该输出被用于控制标签寄存器 913 是否输出其中存储的标签到比较器 915 与存储器地址 805 中的标签 811 进行比较。具体地,若选择器 919 输出为' 1 ',则表示独立标签模块 905 对应存储器地址 805 中的索引号 813 ,在标签寄存器 913 中有效位为' 1 '的情况下,存储的标签被送到比较器 915 与存储器地址 805 中的标签 811 进行比较,否则不进行比较。比较器 915 的比较结果被送到总线 829 上与比较器 921 及其他独立标签模块输出的比较结果进行'或'操作,得到缓存命中或缺失的结果。传输门 917 则与选择器 923 中的传输门类似,其输入为标签寄存器 913 中的组号,在比较器 915 输出比较结果为匹配的情况下,输出该组号 819 ,与存储器地址 805 中的偏移地址 815 一同构成缓存寻址地址 809 。
这样,除了可以按图 8A 实施例所述方法对标签列 903 进行配置,实现可重构缓存外,还可以对所述独立标签模块进行配置,将这些独立标签模块分配给相应的索引号对应的存储器地址使用,以达到更好的灵活性。
例如,当标签列 903 中对应某个索引号的那个表项已经有效,且又发生基于该索引号对应的新存储器地址的指令或数据访问,若尚有未被占用的独立标签模块存在,则可以不对该标签列 903 中的该表项进行替换,而是使用独立标签模块存储该新地址中的标签,起到增加一个路( way )的效果。若所有独立标签模块也都被占用,则可以根据替换算法从这些独立标签模块中分配一个用于替换。这样,在本实施例中,可以利用这些独立标签模块,随着程序的运行实时分配路组,使得某些索引号能够对应超过一个路( way ),而另一些索引号只对应一个路( way )或不对应路( way ),以更合理有效地利用缓存,提高性能。
根据本发明技术方案和构思,还可以有其他任何合适的改动。对于本领域普通技术人员来说,所有这些替换、调整和改进都应属于本发明所附权利要求的保护范围。
工业实用性
本发明提出的装置和方法可以被用于各种与缓存相关的应用中,可以提高处理器系统的效率。
序列表自由内容

Claims (16)

  1. 一种可重构缓存组织结构,其特征在于,缓存的指令或数据存储器中的多个存储块可以构成一个组,且对应同一个标签,形成按组可重构结构;和
    当本次寻址地址与上次寻址地址对应同一个组时,可以省略本次寻址地址的标签比较,直接在该组中找到对应的指令或数据。
  2. 根据权利要求1所述结构,其特征在于,可以将所述缓存中的存储块配置为大小相等或不等的组。
  3. 根据权利要求 2 所述结构,其特征在于,所述本次寻址地址和上次寻址地址是:
    两条地址连续的指令本身的地址;或
    两条被先后执行的数据访问指令分别对应的数据地址;或
    同一条数据访问指令先后被执行时分别对应的数据地址。
  4. 根据权利要求2所述结构,其特征在于,当进行连续地址或相近地址寻址,且地址中的标签部分相同时,即可判定本次寻址地址与上次寻址地址对应同一个组。
  5. 根据权利要求4所述结构,其特征在于,可以根据所述本次寻址地址与上次寻址地址中的索引号部分的差值,以及上次寻址地址对应的指令或数据在缓存中的位置,确定本次寻址地址对应的指令或数据在缓存中的位置。
  6. 根据权利要求5所述结构,其特征在于,可以根据寻址地址中的标签、索引号高位部分的匹配结果,以及索引号低位部分的译码结果在缓存中找到相应指令或数据。
  7. 根 据权利要求6所述结构,其特征在于,可以通过选择线将寻址地址中的标签、索引号高位部分的匹配结果送往用于索引号低位部分译码的译码器;和
    仅在匹配成功项对应的译码器中对索引号低位部分进行译码。
  8. 根据权利要求7所述结构,其特征在于,可以先将选择线上的匹配结果编码后通过总线传输,经解码后再送到对应的译码器。
  9. 根据权利要求6所述结构,其特征在于,还可以将所述缓存中的若干存储块配置为组相联结构,形成按组可重构结构及组相联结构共存的混合结构。
  10. 根据权利要求1所述结构,其特征在于,可以构成组相联缓存结构;其中每组(group)存储块均可以被分配给任意一个组(set)构成其的一个路(way);且最大组(set)数目与最大路(way)数目的乘积大于存储块组(group)的数目。
  11. 根据权利要求10所述结构,其特征在于,所有存储块共享同一组位线。
  12. 根据权利要求10所述结构,其特征在于,包括一个二维表;所述二维表的行对应组(set),且至少包含一行;所述二维表的列对应路(way),且至少包含一列; 所述二维表的表项内容包括:标签、有效位和组号。
  13. 根 据权利要求12所述结构,其特征在于,根据存储器地址中的索引号从所述二维表中读出相应行中所有的有效标签值与存储器地址中的标签进行匹配;
    根据匹配成功项中的组号在缓存中找到该存储器地址对应组(group);
    根据存储器地址中的偏移地址在所述组(group)中访问相应指令或数据。
  14. 根据权利要求13所述结构,其特征在于,还包括至少一个独立标签模块;当一个独立标签模块被分配给一个组(set)时,在所述独立标签模块中存储了所述组(set)对应的索引号值和标签值。
  15. 根据权利要求14所述结构,其特征在于,当需要对一个组(set)分配新的路(way),且所述二维表中该组(set)对应行中的全部表项均有效时,
    若尚有未被占用的独立标签模块存在,在将该独立标签模块分配给该组(set)使用;
    若所有独立标签模块均被占用,则根据替换算法从部分或全部独立标签模块,以及所述二维表中该组(set)对应的路(way)中确定一个用于替换。
  16. 根据权利要求15所述结构,其特征在于,当根据一个存储器地址在所述缓存中寻址时,根据所述存储器地址中的索引号从所述二维表的相应行中读出所有有效标签值,并从存储了所述索引号的所有独立标签模块中读出标签值与所述存储器地址中的标签进行匹配;
    根据匹配成功项中的组号在缓存中找到该存储器地址对应组(group);
    根据存储器地址中的偏移地址在所述组(group)中访问相应指令或数据。
PCT/CN2014/090481 2013-11-08 2014-11-06 一种可重构缓存组织结构 WO2015067195A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201310556883.X 2013-11-08
CN201310556883 2013-11-08
CN201310681802.9 2013-12-11
CN201310681802.9A CN104636268B (zh) 2013-11-08 2013-12-11 一种可重构缓存产品与方法

Publications (1)

Publication Number Publication Date
WO2015067195A1 true WO2015067195A1 (zh) 2015-05-14

Family

ID=53040913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/090481 WO2015067195A1 (zh) 2013-11-08 2014-11-06 一种可重构缓存组织结构

Country Status (2)

Country Link
CN (1) CN104636268B (zh)
WO (1) WO2015067195A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708747A (zh) * 2015-11-17 2017-05-24 深圳市中兴微电子技术有限公司 一种存储器切换方法及装置
CN108132893A (zh) * 2017-12-06 2018-06-08 中国航空工业集团公司西安航空计算技术研究所 一种支持流水的常量Cache

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002079995A1 (en) * 2001-03-30 2002-10-10 Cirrus Logic, Inc. Systems and methods using a system-on-a-chip with soft cache
CN1659525A (zh) * 2002-06-04 2005-08-24 杉桥技术公司 简化了缓存替换策略的实现的多线程缓存方法和装置
WO2009006113A2 (en) * 2007-06-29 2009-01-08 Intel Corporation Hierarchical cache tag architecture
CN101438237A (zh) * 2006-05-10 2009-05-20 高通股份有限公司 基于区块的分支目标地址高速缓冲存储器
CN102662868A (zh) * 2012-05-02 2012-09-12 中国科学院计算技术研究所 用于处理器的动态组相联高速缓存装置及其访问方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7395372B2 (en) * 2003-11-14 2008-07-01 International Business Machines Corporation Method and system for providing cache set selection which is power optimized
JP2010097557A (ja) * 2008-10-20 2010-04-30 Toshiba Corp セットアソシアティブ方式のキャッシュ装置及びキャッシュ方法
US8271732B2 (en) * 2008-12-04 2012-09-18 Intel Corporation System and method to reduce power consumption by partially disabling cache memory
US8458447B2 (en) * 2011-06-17 2013-06-04 Freescale Semiconductor, Inc. Branch target buffer addressing in a data processor
CN102364431B (zh) * 2011-10-20 2014-09-10 北京北大众志微系统科技有限责任公司 一种实现读指令执行的方法及装置
CN102541510B (zh) * 2011-12-27 2014-07-02 中山大学 一种指令缓存系统及其取指方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002079995A1 (en) * 2001-03-30 2002-10-10 Cirrus Logic, Inc. Systems and methods using a system-on-a-chip with soft cache
CN1659525A (zh) * 2002-06-04 2005-08-24 杉桥技术公司 简化了缓存替换策略的实现的多线程缓存方法和装置
CN101438237A (zh) * 2006-05-10 2009-05-20 高通股份有限公司 基于区块的分支目标地址高速缓冲存储器
WO2009006113A2 (en) * 2007-06-29 2009-01-08 Intel Corporation Hierarchical cache tag architecture
CN102662868A (zh) * 2012-05-02 2012-09-12 中国科学院计算技术研究所 用于处理器的动态组相联高速缓存装置及其访问方法

Also Published As

Publication number Publication date
CN104636268A (zh) 2015-05-20
CN104636268B (zh) 2019-07-26

Similar Documents

Publication Publication Date Title
US9472248B2 (en) Method and apparatus for implementing a heterogeneous memory subsystem
US5535361A (en) Cache block replacement scheme based on directory control bit set/reset and hit/miss basis in a multiheading multiprocessor environment
CN1307561C (zh) 不同高速缓存级上具有关联集重叠同余组的多级高速缓存
CN102073533B (zh) 支持动态二进制翻译的多核体系结构
JP2002373115A (ja) 共有キャッシュメモリのリプレイスメント制御方法及びその装置
US20010032299A1 (en) Cache directory configuration method and information processing device
US7260674B2 (en) Programmable parallel lookup memory
US20180260333A1 (en) Systems and methods for addressing a cache with split-indexes
JP2002055879A (ja) マルチポートキャッシュメモリ
WO2015067195A1 (zh) 一种可重构缓存组织结构
JPS63201851A (ja) バッファ記憶アクセス方法
US20070266199A1 (en) Virtual Address Cache and Method for Sharing Data Stored in a Virtual Address Cache
US20100257319A1 (en) Cache system, method of controlling cache system, and information processing apparatus
US9195622B1 (en) Multi-port memory that supports multiple simultaneous write operations
US20170185516A1 (en) Snoop optimization for multi-ported nodes of a data processing system
KR101645003B1 (ko) 메모리 제어기 및 그 메모리 제어기가 탑재된 컴퓨팅 장치
US7406554B1 (en) Queue circuit and method for memory arbitration employing same
EP0310446A2 (en) Cache memory management method
US9081673B2 (en) Microprocessor and memory access method
US20140136796A1 (en) Arithmetic processing device and method for controlling the same
JPH11316744A (ja) 並列プロセッサおよび演算処理方法
US6996675B2 (en) Retrieval of all tag entries of cache locations for memory address and determining ECC based on same
US11829293B2 (en) Processor and arithmetic processing method
US9430397B2 (en) Processor and control method thereof
KR960005394B1 (ko) 멀티 프로세서 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14859478

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14859478

Country of ref document: EP

Kind code of ref document: A1