WO2020001665A2 - On-chip cache and integrated chip - Google Patents

On-chip cache and integrated chip Download PDF

Info

Publication number
WO2020001665A2
WO2020001665A2 PCT/CN2019/112380 CN2019112380W WO2020001665A2 WO 2020001665 A2 WO2020001665 A2 WO 2020001665A2 CN 2019112380 W CN2019112380 W CN 2019112380W WO 2020001665 A2 WO2020001665 A2 WO 2020001665A2
Authority
WO
WIPO (PCT)
Prior art keywords
data unit
priority
information
chip
page
Prior art date
Application number
PCT/CN2019/112380
Other languages
French (fr)
Chinese (zh)
Other versions
WO2020001665A3 (en
Inventor
张乾龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201980101522.1A priority Critical patent/CN114556335A/en
Priority to PCT/CN2019/112380 priority patent/WO2020001665A2/en
Publication of WO2020001665A2 publication Critical patent/WO2020001665A2/en
Publication of WO2020001665A3 publication Critical patent/WO2020001665A3/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/123Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/126Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • G06F2212/1044Space efficiency improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/401Compressed data

Definitions

  • the present application relates to the field of chip technology, and in particular, to an on-chip cache and an integrated chip.
  • the on-chip cache capacity of processors is getting larger and larger, and the implementation media of on-chip caches are becoming more and more diverse, such as static random-access memory (SRAM) and enhanced dynamics.
  • the random access memory enhanced dynamic random access memory, eDRAM
  • eDRAM enhanced dynamic random access memory
  • the random access memory can achieve a cache of 128MB or more
  • the 3D packaged SRAM can further increase the storage density of ordinary SRAM to achieve greater storage capacity and access bandwidth.
  • 2.5D or 3D packaged DRAM (ie, on-chip memory) as on-chip cache capacity can reach 16GB, and is expected to further increase.
  • the larger the cache capacity the greater the impact of its management efficiency on processor performance.
  • both the on-chip memory and the on-chip cache refer to a 2.5D or 3D packaged DRAM.
  • On-chip large-capacity cache is a new technology proposed by the industry to solve the access bandwidth problem of memory systems.
  • the large-capacity on-chip cache can package memory dies on-chip through through silicon via (TSV) technology, thereby achieving the effect of improving the access bandwidth of the memory system.
  • TSV through silicon via
  • DDR double-rate dynamic random access memory
  • on-chip caches can work in three modes: cache mode, flat mode, and mixed mode.
  • cache mode the on-chip cache can be used as a cache for off-chip memory; when working in flat mode, the on-chip cache is used as ordinary memory; when working in mixed mode, some on-chip caches are used as off-chip memory caches, and some on-chip caches are used as ordinary RAM. This application improves the scenario where the on-chip cache is used as the cache of off-chip memory.
  • FIG. 1 it is a schematic diagram of a page stored in an on-chip cache.
  • each small box represents a footprint (for example, it can be a block).
  • the gray-filled footprint stores data, and the white-filled footprint is blank, and no data is stored. That is, the gray-filled footprint can store frequently accessed data on a page, and the data corresponding to the white-filled portion of the page is accessed less frequently, so it is not stored in the on-chip cache.
  • the embodiments of the present application provide an on-chip cache and an integrated chip, which are used to solve the problems of low utilization of page storage space and waste of on-chip cache storage space in the prior art.
  • an embodiment of the present application provides an on-chip cache, including: a storage unit configured to store a first page.
  • the first page includes a first part of the data unit and a second part of the data unit.
  • the tag information has a first priority
  • the second tag information corresponding to the second part of the data unit has a second priority
  • the second priority is lower than the first priority.
  • the data unit stored in the first page can be divided into two parts, and each part of the data unit corresponds to one tag information.
  • one tag information corresponds to one page in the off-chip memory, that is, in the on-chip cache provided in the first aspect, each part of the data unit is used to store a data unit of one page in the off-chip memory.
  • the on-chip cache provided in the first aspect can store data units that were originally stored in two pages in off-chip memory to the first page.
  • the data units stored in one page in the on-chip cache are all from the same page in off-chip memory, that is, the data units stored in the on-chip cache correspond to one tag information, and the data provided in the first aspect
  • the first page of the on-chip cache stores two data units corresponding to the tag information. That is to say, the on-chip cache provided in the first aspect can compress and store data originally stored in two pages to one page, thereby saving the storage space of the on-chip cache.
  • the storage unit is further configured to store index information of the first page, where the index information of the first page includes index information of the first part of the data unit and index information of the second part of the data unit.
  • the data unit stored in the first page can be indexed according to the index information of the first page.
  • the index information of the first part of the data unit may include first tag information and first valid bit information
  • the index information of the second part of the data unit may include second tag information and second valid bit information
  • the priority of the first part of the data unit and the valid data unit in the first page can be determined by the index information of the first part of the data unit; the priority of the second part of the data unit can be determined by the index information of the second part of the data unit And valid data units on the first page.
  • the on-chip cache may further include: a storage controller, configured to set a priority of the second tag information to a first priority, and set the first tag information or a The priority of the third tag information is set to the second priority.
  • the first page further includes a third part of the data unit, and the fourth tag information corresponding to the third part of the data unit has a third priority, and the third priority is lower than the second priority.
  • an embodiment of the present application provides an integrated chip, including: an on-chip cache for storing a first page, where the first page includes a first part data unit and a second part data unit, and a first part corresponding to the first part data unit
  • the tag information has a first priority
  • the second tag information corresponding to the second part of the data unit has a second priority
  • the second priority is lower than the first priority
  • the integrated chip provided in the second aspect With the integrated chip provided in the second aspect, the first part of the data unit corresponding to the first tag information and the second part of the data unit corresponding to the second tag information can be stored in the first page. Compared with the prior art solution of storing only one data unit corresponding to one tag information on one page, the integrated chip provided in the second aspect can be used to compress and store the data that was originally stored in multiple pages in the on-chip cache China to one page. Pages, thereby saving on-chip cache storage space.
  • the on-chip cache is further configured to store index information of the first page, where the index information of the first page includes index information of the first part of the data unit and index information of the second part of the data unit.
  • the data unit stored in the first page can be indexed according to the index information of the first page.
  • the index information of the first part of the data unit includes first tag information and first valid bit information
  • the index information of the second part of the data unit includes second tag information and second valid bit information
  • the priority of the first part of the data unit and the valid data unit in the first page can be determined by the index information of the first part of the data unit; the priority of the second part of the data unit can be determined by the index information of the second part of the data unit And valid data units on the first page.
  • the integrated chip provided in the second aspect further includes: a processor for sending a first access instruction, the first access instruction for requesting access to the first data unit, and the first data unit corresponds to the first The tag information; the on-chip cache is further configured to: determine that the first data unit is stored in the on-chip cache according to the first significant bit information; and send the first data unit to the processor.
  • the processor is further configured to send a second access instruction, the second access instruction is used to request access to the second data unit, and the second data unit corresponds to the second tag information;
  • the on-chip cache is further configured to: Determine that the second data unit is stored in the on-chip buffer according to the second significant bit information; and send the second data unit to the processor.
  • the on-chip cache is further configured to determine whether to set the priority of the second tag information to the first priority according to the number of times the second data unit is accessed in a unit time, and set the first tag The priority of the information or the third tag information having the first priority is set to the second priority.
  • the processor is further configured to: send a third access instruction, the third access instruction is used to request access to the third data unit, and the third data unit corresponds to the second tag information;
  • the on-chip cache is further configured to: It is determined that the third data unit is not stored in the on-chip cache according to the second significant bit information;
  • the processor is further configured to: read the third data unit from the off-chip memory;
  • the on-chip cache is further configured to: store the third data unit.
  • the processor requests to access the data unit corresponding to the second tag information, if the data unit is not stored in the on-chip cache, the corresponding data unit is read from the off-chip memory, and the on-chip cache stores the data unit storage.
  • the on-chip cache when the on-chip cache stores the third data unit, it is specifically configured to store the third data unit in the first page or the second page.
  • the second page and the first page are data units on the same Way in the Data Array.
  • the on-chip cache is further configured to: set the valid position corresponding to the third data unit in the second significant bit information to be valid; set the priority of the second tag information to the first priority, and set the first tag information Or the priority of the fourth tag information having the first priority is set to the second priority.
  • the second tag information can be regarded as the victim role of the first tag information.
  • the data unit corresponding to the second tag information is not stored in the on-chip cache, and the corresponding storage location of the data unit is occupied by the first tag information, then When the corresponding data is read into the on-chip cache, the second tag information having the second priority needs to be upgraded to the first priority.
  • one of the tag information having the first priority in the Tag Array must be downgraded to the second priority. This downgraded tag information may be the first tag information or the fourth tag information.
  • the integrated chip provided in the second aspect may further include: a tag buffer, configured to store the first tag information and the second tag information.
  • the processor when the processor issues an access instruction, it can first look up the corresponding information in the tag buffer to determine whether the data accessed by the processor is stored in the on-chip cache, thereby improving data access efficiency.
  • the first page further includes a third part of the data unit, and the fourth tag information corresponding to the third part of the data unit has a third priority, and the third priority is lower than the second priority.
  • FIG. 1 is a schematic diagram of an on-chip cache storage page provided by the prior art
  • FIG. 2 is a schematic structural diagram of a first integrated chip according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a page in an on-chip cache provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a label array and a data array according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a second integrated chip according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an on-chip cache according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a third integrated chip according to an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a first priority replacement process according to an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a second priority replacement process according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a data access process according to an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a third priority replacement process according to an embodiment of the present application.
  • the embodiment of the present application can be applied to the integrated chip shown in FIG. 2.
  • the integrated chip includes a processor, a memory controller (MC) and an on-chip cache.
  • the integrated chip is also connected to off-chip memory.
  • the processor is used to initiate data access requests and perform data processing;
  • the memory controller is used to control the data interaction between the processor and the on-chip cache, and between the processor and off-chip memory;
  • the off-chip memory stores a large amount of data, and the on-chip memory
  • the cache can be regarded as an on-chip cache outside the chip.
  • the on-chip cache is used as a cache of off-chip memory.
  • the on-chip cache stores some data in off-chip memory.
  • the processor issues an access instruction, if the data accessed by the processor is stored in the on-chip cache, the data is directly returned from the on-chip cache; if the data accessed by the processor is not stored in the on-chip cache, it needs to be accessed from off-chip
  • the data is accessed internally, and the retrieved data is stored in the on-chip cache for the next access hit.
  • the cache space is allocated by the cache for pages.
  • a page can be divided into multiple data units.
  • the on-chip cache can store the page that has been previously processed by the processor. The accessed data unit is for subsequent re-access by the processor; that is, not all data units in the page may be cached in the on-chip cache.
  • the data unit stored in one page of the on-chip cache may be divided into multiple parts, and each part of the data unit corresponds to one tag information (Tag).
  • each part of the data unit is used to store a data unit of a page in the off-chip memory.
  • the on-chip cache can store data units that were originally stored in multiple pages in off-chip memory to one page of the on-chip cache.
  • FIG. 3 the schematic diagram of one page in the on-chip cache can be shown in FIG. 3.
  • one page of the on-chip cache stores a data unit corresponding to the first tag information (Tag1) and a data unit corresponding to the second tag information (Tag2).
  • the data unit corresponding to Tag1 is stored in one page in off-chip memory, and the data unit corresponding to Tag2 is stored in another page in off-chip memory.
  • the data units stored in a page in the on-chip cache are all from the same page in the off-chip memory, that is, the data units stored in the on-chip cache correspond to a tag.
  • a page in the on-chip cache stores data units corresponding to multiple tags. That is, in the embodiment of the present application, data originally stored in multiple pages in the off-chip memory may be compressed and stored in one page of the on-chip cache, thereby saving the storage space of the on-chip cache.
  • index information of the pages is also stored.
  • the index information of the page can be stored in a tag array, and the page can be stored in a data array.
  • the index information of the pages in the tag array corresponds to the pages in the data array.
  • the Tag Array and Data Array are stored in the form of an array.
  • each label array can be an m * n array, and each element in the array is the index information of the page.
  • the row vectors in the m * n array can be called Set, and the column vectors in the m * n array can be called Way.
  • each row vector includes N elements
  • this storage structure may be referred to as an N-way group connection.
  • the on-chip cache can use a four-way set of connected storage structures, that is, each row vector contains four elements, and in the label array, one of the four elements contained in each row vector represents the index of a page Information. In the data array, one of the four elements contained in each row vector represents a page.
  • FIG. 4 A specific example of a tag array and a data array can be shown in FIG. 4.
  • the elements in the TagArray correspond to the elements in the DataArray one-to-one, and the elements in the TagArray are used to indicate the index information of the corresponding pages in the DataArray.
  • the on-chip cache adopts a four-way group connection method. In practical applications, the on-chip cache can also use an eight-way group connection method and other architectures, which are not limited in the embodiment of the present application.
  • one page of the on-chip cache stores data units in multiple pages of off-chip memory, that is, the data unit stored in one page of the on-chip cache can be divided into multiple section.
  • Each part of the data unit corresponds to a set of index information, and multiple sets of index information constitute the index information of a page cached in the slice.
  • the data unit stored in the on-chip cache is divided into two parts, A and B, then the index information of the page includes two sets of index information, one set of index information is used for index A, and the other set of index information is used for index B. section.
  • each set of index information may include the foregoing tag information (Tag).
  • Tag In addition to the tag information (Tag), it can also include the following information: Overall valid bit information (Valid), which is used to indicate whether the entire tag is valid. If the tag is invalid, all data units corresponding to the tag are not accessible; least recently used (least Recently used (LRU) information is used to indicate the least recently used data unit; Dirty Bits information is used to indicate whether the data of the data unit stored in the on-chip cache is dirty. If Dirty Bits is some The bit is set to 0, indicating that the corresponding data unit is clean data. When replacement occurs, it can be invalid without writing back to off-chip memory. Otherwise, if some bits of Dirty Bits are set to 1, the corresponding dirty data needs to be written when replacement occurs. Back to off-chip memory.
  • Overall valid bit information (Valid)
  • LRU least Recently used
  • Dirty Bits is used to indicate whether the data of the data unit stored in the on-chip cache is dirty. If Dirty Bits is some The bit is set to 0, indicating that the corresponding data
  • Valid bits information is used to indicate valid data units in the page, that is, data is stored in valid data units in the page, and data is not stored in invalid data units. For example, in the example in FIG. 3, blank The valid bit information corresponding to the grid is invalid, and no data is stored in it.
  • the processor By looking up the ValidBits information, you can determine whether the data unit accessed by the processor is stored in the on-chip cache. For example, the processor initiates an access instruction requesting access to a certain data unit corresponding to Tag1. By searching the TagArray in the on-chip cache, the index information of Tag1 (which can be called a Tag1 hit) can be found. At this time, further judgment and processing can be performed.
  • the valid bit information corresponding to the data unit that the device requests to access.
  • the corresponding significant bit information cannot be valid at the same time. This is because, in the embodiment of the present application, data units in multiple pages in off-chip memory are stored in one page in the on-chip cache, and a page in the off-chip memory has the same storage space as a page in the on-chip cache. , Then for a storage location for storing a data unit in a page of the on-chip cache, if the storage location is used to store a data unit corresponding to Tag1 (in the index information of Tag1, the storage location corresponds to Valid Bits is valid), then this storage location cannot be used to store the data unit corresponding to Tag2 (in the index information of Tag2, the Valid Bits corresponding to this storage location is invalid).
  • the storage position of the second row and the first column is used to store the data unit corresponding to Tag1, and then the storage location cannot simultaneously store the data unit corresponding to Tag2.
  • the fifth bit in ValidBits corresponding to Tag1 is valid
  • the fifth bit in ValidBits corresponding to Tag2 is invalid.
  • each set of index information can also include access bit information (Reference bits), which is used to record the historical information of which data units have been accessed.
  • Reference bits access bit information
  • each set of index information can also include access bit information (Reference bits), which is used to record the historical information of which data units have been accessed.
  • a page since a page stores data units corresponding to multiple tags in the embodiment of the present application, for each tag in the corresponding page, there is a set of the above including Tag, Valid, LRU, Dirty Bits, Valid Bits and Reference Bits index information.
  • a page includes data units corresponding to two tags as an example, and the index information of the page includes two groups. Specifically, among the two tags, a tag with a higher priority is called a Prime Tag, and a tag with a lower priority is called a Sub Tag.
  • the index information of the page includes index information corresponding to Prime Tag and index information corresponding to Sub Tag.
  • the data unit corresponding to the tag (Prime tag) with a higher priority can be stored in the on-chip cache preferentially.
  • the storage location can only store a data unit corresponding to a tag, and in determining which storage location is used to store which tag corresponds to When it comes to data units, the issue of storage priority is involved.
  • a data unit corresponding to a SubTag wants to be stored in the on-chip cache, and the storage location of the data unit is occupied by a data unit corresponding to a PrimeTag, then the data unit corresponding to the SubTag cannot be stored in In the on-chip cache; for example, if a data unit corresponding to the Prime Tag is to be stored in the on-chip cache, and the storage location of the data unit is occupied by the data unit corresponding to the Sub Tag, then the data unit corresponding to the Prime Tag is This storage location can be occupied, and the data unit corresponding to the SubTag is kicked out of the on-chip cache.
  • the memory controller may further include a tag buffer (Tag buffer, TB), which is used to cache part of the data in the Tag Array stored in the on-chip cache to the TB (in order to save The storage space of the memory controller considers that only part of the data of the TagArray is stored to the TB, and not all of the data of the TagArray is stored to the TB), thereby improving the access speed of the page index information.
  • a tag buffer (Tag buffer, TB)
  • TB tag buffer
  • the processor issues an access instruction, it can first determine whether the accessed data is stored in the on-chip cache by looking up the TB. If the accessed data is determined to be stored in the on-chip cache by looking up the TB, you can access the data directly from the on-chip cached Data Array without searching the Tag Array in the on-chip cache, thereby improving the access to the index information of the page.
  • the data stored in the TB may be newer than the data stored in the on-chip cache, so when a data update occurs in the TB, the updated data needs to be written back to the on-chip cache.
  • the data in the TB in order to ensure that the data in the TB is correct, when the page index information in the on-chip cache or off-chip memory changes, the data in the TB must be updated synchronously.
  • the Tag Array stored in the TB is also stored in the form of an array.
  • it can be an m * n array, and each element in the array is the index information of the page.
  • the row vectors in the m * n array can be called Set, and the column vectors in the m * n array can be called Way.
  • this storage structure of TB can be referred to as an N-way group connection.
  • the TB may adopt a four-way group connected storage structure, that is, each row vector contains four elements.
  • data stored in the TB and the on-chip cache may be as shown in FIG. 5.
  • the on-chip cache and the off-chip memory share a memory controller.
  • the on-chip cache and the off-chip memory can also be controlled by two controllers, which are not limited in this application.
  • the meaning of the tag information (Tag) is the same as that of the Prime Tag or the Sub Tag in the Tag Array of the on-chip cache, and the value (Value) represents the Tag Array of the on-chip cache Valid, LRU, Valid Bits, Dirty Bits, and Reference Bits.
  • the status of PrimeTag and SubTag is equal from TB, so each item (Tag + Value) in TB is targeted to a PrimeTag or a SubTag. Instead of recording a pair of PrimeTag and SubTag at the same time as the on-chip cache.
  • one element in the on-chip cached TagArray corresponds to two sets of Tag + Value stored in TB.
  • Each set of Tag + Value can be regarded as one element in the storage array of TB, so it is not difficult to see that
  • the TB is a four-way group connected storage structure, that is, each row vector includes four elements.
  • a single cache set includes four page index information, each page index information occupies 64B of storage space, and is used to store index information of a 4KB page. .
  • historical data information is inserted after the page index information, which is used to record the relevant information of the page that was kicked out of the on-chip cache (including the tag information Tag, the count value of the number of visits, and the historical footprint of the visit), so that The next time the page is stored in the on-chip cache, the relevant information of the page can be obtained directly.
  • the information included in the page index information in FIG. 5 is basically the same as the example in FIG. 4. The difference is that the example in FIG. 5 further includes a role flip bit.
  • multiple tags corresponding to a page are prioritized.
  • the data unit corresponding to a tag with a higher priority can be stored in the on-chip cache first.
  • the priority of a PrimeTag is higher than that of a SubTag. Priority, but in some cases (for example, the frequency of access to the data corresponding to the SubTag) needs to be replaced by the priority. At this time, in order to avoid the overhead caused by data dump, you can use this role to flip the bit. Instructions. For example, when the role rollover bit is 0 (that is, by default), the PrimeTag has a higher priority; when the role rollover bit is 1, the SubTag has a higher priority.
  • the on-chip cache is integrated on the chip (that is, the on-chip cache is located inside the integrated chip) as an example.
  • an off-chip cache may also be provided outside the integrated chip.
  • the off-chip cache adopts the same storage structure design as the on-chip cache provided in the embodiment of the present application and implements the same implementation as the on-chip cache provided in the embodiment of the present application. Function, the off-chip cache should also be regarded as falling within the protection scope of the embodiment of the present application.
  • the MC can be used as the controller of the on-chip cache and off-chip memory to realize the data interaction between the processor and the on-chip cache and off-chip memory.
  • the on-chip cache can also be separately configured with a storage controller to control data access in the on-chip cache.
  • the storage controller in the on-chip cache can access the data unit corresponding to the tag.
  • Statistics can be used to determine whether the foregoing priority replacement is required (SubTag is promoted to PrimeTag, which can also be referred to as SubTag upgrade).
  • an embodiment of the present application provides an on-chip cache and an integrated chip, which intends to optimize the storage efficiency of the above-mentioned large-capacity cache system to further improve cache management efficiency.
  • a 2.5D or 3D packaged on-chip cache is taken as an example for introduction.
  • the embodiment of the present application is also applicable to on-chip caches made of other media (such as SRAM, 3D-SRAM, and eDRAM mentioned in the background art).
  • the on-chip cache 600 includes a storage unit 601, which is configured to store a first page.
  • the first page includes a first part of the data unit and a second part of the data unit.
  • the first tag information corresponding to the first part of the data unit has a first priority
  • the second tag information corresponding to the second part of the data unit has a second priority
  • the second priority is lower than the first priority.
  • the data unit stored in the first page can be divided into two parts, and each part of the data unit corresponds to a tag information (Tag).
  • a tag corresponds to a page in off-chip memory, that is, in on-chip cache 600, each part of the data unit is used to store a page of data units in off-chip memory.
  • On-chip cache 600 can The data unit originally stored in the two pages in the off-chip memory is stored in the first page.
  • the data units stored in a page in the on-chip cache are all from the same page in the off-chip memory, that is, the data units stored in the on-chip cache correspond to a tag.
  • One page (that is, the first page) of the on-chip cache 600 stores data units corresponding to multiple tags.
  • the on-chip cache 600 shown in FIG. 6 can compress and store data originally stored in two pages to one page, thereby saving the storage space of the on-chip cache.
  • Tag1 may be regarded as the first tag information
  • Tag2 may be regarded as the second tag information
  • the first priority and the second priority are briefly described. It can be understood that in the off-chip memory, the data corresponding to Tag1 can occupy the storage space of an entire page, and the data corresponding to Tag2 can also occupy the storage space of an entire page. However, due to the limited storage space of the on-chip cache, the on-chip cache may store only part of the data in the off-chip memory (such as frequently accessed data or data that has been accessed), then the data unit corresponding to Tag1 It may be only partially stored in the first page, as is the data unit corresponding to Tag2. This will cause a problem. For a certain storage location in the first page (for example, the storage location of the first row and the first column in FIG.
  • Tag1 in the off-chip memory corresponds to the data unit stored in that location and Tag2 corresponds to The data units stored in this position all want to be stored in the first page of the on-chip cache. At this time, there must be a priority order to determine whether the data unit corresponding to Tag1 is stored in this location, or Tag2 is corresponding. The data unit is stored in this location.
  • the priority of the first tag information (Tag1) is set higher than the priority of the second tag information (Tag2).
  • the storage unit 601 is further configured to store index information of the first page.
  • the index information of the first page includes index information of the first part of the data unit and index information of the second part of the data unit.
  • the first half of the index information of the page is PrimeTag + Valid + LRU + ValidBits + DirtyBit + ReferenceBits can be regarded as the index information of the first part of the data unit, and the second half of SubTag + Valid + LRU + Valid Bits + Dirty Bits + Reference Bits can be regarded as the index information of the second part of the data unit.
  • the index information of the first part of the data unit may include first tag information (such as PrimeTag) and first valid bit information (for example, Valid bits), and the index information of the second part of the data unit may include second tag information (for example, SubTag) And second significant bit information (for example, Valid Bits).
  • first tag information such as PrimeTag
  • first valid bit information for example, Valid bits
  • second tag information for example, SubTag
  • second significant bit information for example, Valid Bits
  • the first significant bit information may be used to indicate a valid data unit in the first page.
  • the first page includes 4 * 16 data units, then the first significant bit information can be represented by 64b data. If the corresponding position stores the first part of the data unit, the The corresponding bit field in the first significant bit information is set to valid (for example, it can be set to 1). For example, in the 64 positions of the first page, the first position stores the data unit corresponding to Tag1, then the 64b The first position in the first significant bit information is valid.
  • the meaning of the second significant bit information is similar to that of the first significant bit information, and details are not described herein again.
  • the on-chip cache 600 may further include a storage controller 602, configured to set the priority of the second tag information to the first priority, and set the first tag information or the third tag information having the first priority to The priority is set to the second priority.
  • the first priority of the first part of the data unit is higher than the second priority of the second part of the data unit.
  • priority replacement is required.
  • the priority of the second tag information needs to be set to the first priority.
  • the priority of the first tag information or the third tag information having the first priority is set as the second priority.
  • the priority of the second label information is higher than the priority of the first label information, and then the data unit corresponding to the second label information can be stored in the on-chip cache preferentially.
  • the replacement when the priority replacement is performed, the replacement may be performed by using a role flip bit as in the example of FIG. 5, and the index information of the first part of the data unit and the index information of the second part of the data unit may be stored in a conventional manner. Relocation.
  • the first page stores two data units (the first data unit and the second data unit) as an example.
  • the on-chip cache also It is possible to store more than two data units corresponding to the tag information, as long as the priority of the data unit of each part is reasonably limited.
  • the foregoing first page may further include a third part of the data unit, and the fourth tag information corresponding to the third part of the data unit has a third priority, and the third priority is lower than the second priority.
  • the on-chip cache 600 shown in FIG. 6 can be used to compress and store data originally stored in multiple pages to one page. So as to save the storage space of the on-chip cache.
  • the integrated chip 700 includes an on-chip cache 701.
  • the on-chip cache 701 is used to store a first page.
  • the first page includes a first part of the data unit and a second part of the data unit.
  • the first tag information corresponding to the first part of the data unit has a first priority
  • the first part of the second part of the data unit corresponds to the first
  • the two-label information has a second priority, and the second priority is lower than the first priority.
  • the integrated chip 700 may be a system on chip (SoC).
  • SoC system on chip
  • the on-chip cache 701 is further configured to store index information of the first page, where the index information of the first page includes index information of the first part of the data unit and index information of the second part of the data unit.
  • the index information of the first part of the data unit includes the first tag information and the first significant bit information
  • the index information of the second part of the data unit includes the second label information and the second significant bit information.
  • the first page further includes a third part of the data unit, and the fifth tag information corresponding to the third part of the data unit has a third priority, and the third priority is lower than the second priority.
  • on-chip cache 701 For a related description of the above-mentioned on-chip cache 701, reference may be made to the related introduction in the on-chip cache 600 shown in FIG. 6, and details are not described herein again.
  • the integrated chip 700 may further include: a processor 702, configured to send a first access instruction, where the first access instruction is used to request access to the first data unit, and the first data unit corresponds to the first tag information; the on-chip cache 701 is further configured to: : Determine that the first data unit is stored in the on-chip buffer 701 according to the first valid bit information; and send the first data unit to the processor 702.
  • a processor 702 configured to send a first access instruction, where the first access instruction is used to request access to the first data unit, and the first data unit corresponds to the first tag information
  • the on-chip cache 701 is further configured to: : Determine that the first data unit is stored in the on-chip buffer 701 according to the first valid bit information; and send the first data unit to the processor 702.
  • the processor 702 requests to access the data unit corresponding to the first tag information, if the data unit is stored in the on-chip cache 701, the data unit is directly returned to the processor 702 without having to be retrieved from off-chip memory. Access, thereby improving data access efficiency.
  • the processor 702 is further configured to send a second access instruction, the second access instruction is used to request access to the second data unit, and the second data unit corresponds to the second tag information;
  • the on-chip cache 701 is also used Yu: Determine that the second data unit is stored in the on-chip buffer 701 according to the second significant bit information; and send the second data unit to the processor 702.
  • the processor 702 requests to access the data unit corresponding to the second tag information, if the data unit is stored in the on-chip cache 701, the data unit is directly returned to the processor 702 without having to be retrieved from off-chip memory. Access, thereby improving data access efficiency.
  • the on-chip cache 701 is further configured to determine whether to prioritize the second tag information according to the number of times the second data unit is accessed in a unit time. The priority is set to the first priority, and the priority of the first label information or the third label information having the first priority is set to the second priority.
  • the priorities of the second tag information and the first tag information may be replaced. Then, after determining that the processor 702 accesses the data unit corresponding to the second tag information, the on-chip cache 701 may determine whether to perform the foregoing priority replacement according to the number of times the data unit corresponding to the second tag information is accessed in a unit time.
  • the access count of the second tag information exceeds 20% of the minimum value of the access count of the first priority tag information in the same set (taking a 4-way group connection as an example, including four pages in the same set), then Perform the above priority replacement process.
  • the tag information with the first priority in the same Set is selected for replacement.
  • the processor 702 is further configured to send a third access instruction, the third access instruction is used to request access to a third data unit, and the third data unit corresponds to the second tag information; the on-chip cache 701 also Used for: determining that the third data unit is not stored in the on-chip cache 701 according to the second significant bit information; the processor 702 is further configured to read and store the third data unit from the off-chip memory; and the on-chip cache 701 is further used for A third data unit is stored.
  • the processor 702 requests to access the data unit corresponding to the second tag information, if the data unit is not stored in the on-chip cache 701, the corresponding data unit is read from the off-chip memory, and the on-chip cache 701 will The data unit is stored.
  • the third data unit may be stored in the first page or the second page.
  • the second page and the first page are data units on the same Way in the Data Array.
  • the on-chip cache 701 is further configured to: make the effective position corresponding to the third data unit in the second significant bit information valid; set the priority of the second label information to the first priority, and set the first label The priority of the information or the fourth tag information having the first priority is set as the second priority.
  • the second tag information may be regarded as the victim role of the first tag information.
  • the second tag information having the second priority needs to be upgraded to the first priority.
  • one of the tag information having the first priority in the Tag Array must be downgraded to the second priority. This downgraded tag information may be the first tag information or the fourth tag information.
  • SubTag promotion it can be the priority replacement in the same way (such as the priority replacement of the first tag information and the second label information), or the priority replacement in different Ways (such as the first tag information and the third Priority replacement of the label information, or priority replacement of the first label information and the fourth label information).
  • the replaced tag information with the first priority may be in the same Way as the tag information with the second priority cached in the movie, or it may be a different Way. inside.
  • the replaced tag information with the first priority is the same as the tag information with the second priority cached in the movie.
  • FIG. 8 A specific example of the SubTag Promote in the same Way can be shown in Figure 8.
  • the processor accesses the data unit corresponding to the fifth significant bit in the second significant bit information, and the data unit is not stored in the on-chip cache, then the data unit is read from the off-chip DRAM.
  • the data units corresponding to other valid bit information in the second tag information must be retrieved together (that is, data prefetching is performed based on historical access information).
  • the fifth valid position in the second significant bit information needs to be valid, and the priority of the second tag information is upgraded to the first priority. For this reason, the tag information with the first priority (hereinafter referred to as Tag1) is selected for replacement.
  • Tag1 the tag information with the first priority
  • the valid bit information corresponding to Tag1 is 10010010, and the valid bits corresponding to the second priority tag information (hereinafter referred to as Tag2) in the same Way.
  • the information is 01100001.
  • the priority of the second tag information is upgraded to the first priority
  • the data corresponding to Tag2 is kicked out of the on-chip cache and stored in the off-chip DRAM;
  • Tag1 is downgraded to the second priority because the first valid bit and the fourth
  • the data corresponding to each valid bit conflicts with the upgraded second tag information. Because Tag1 has a lower priority, the corresponding data unit in Tag1 is stored in the off-chip DRAM.
  • the valid bit information is shown in Figure 8.
  • the replaced Tag 2 is in the same Way as the tag 1 that is kicked out of the on-chip cache.
  • the replaced tag information with the first priority is different from the tag information with the second priority cached in the movie.
  • SubTag and Promote in different Ways can be shown in Figure 9.
  • the processor accesses the data unit corresponding to the fifth significant bit in the second significant bit information, and the data unit is not stored in the on-chip cache, then the data unit is read from the off-chip DRAM. It should be noted that When fetching data from the external DRAM, the data units corresponding to other significant bit information in the second tag information must be retrieved together. After the corresponding data is retrieved, the fifth valid position in the second significant bit information needs to be valid, and the priority of the second tag information is upgraded to the first priority.
  • Tag1 with the first priority in Way1 is selected for replacement, and then the corresponding information of Tag1 can be found in other locations in the on-chip cache for storage.
  • the part of Tag2 corresponding information that does not conflict with the updated second significant bit information can be retained, and the conflicting part is stored in the off-chip DRAM.
  • Tag4 with the second priority in Way3 for replacement, so all the corresponding information of Tag4 can be stored in the off-chip DRAM. Then, compare Tag1 with Tag3 with the first priority in Way3. The non-conflicting part with Tag3 can be kept in the on-chip cache, and the conflicting part with Tag3 is stored in the off-chip DRAM.
  • the integrated chip 700 may further include a tag buffer 703 for storing the first tag information and the second tag information.
  • the tag buffer 703 may be located in the MC.
  • the tag buffer 703 may further store index information of pages stored in the on-chip cache.
  • information such as valid bit information and dirty bit information can also be stored.
  • the processor issues an access instruction, the corresponding information may be first searched in the tag buffer 703 to determine whether the data accessed by the processor is stored in the on-chip cache, thereby improving data access efficiency.
  • the data in the tag buffer 703 may be newer than the data in the on-chip cache 701. Therefore, when data update occurs in the tag buffer 703, the updated data needs to be written back to the on-chip cache 701. In addition, when the page index information in the on-chip cache 701 or the off-chip memory changes, the data in the tag buffer 703 must be updated synchronously.
  • the judgment of the storage location of the accessed data may occur in the tag buffer (TB), the on-chip cache (DC, that is, DRAM cache), and the off-chip DRAM.
  • the data access flow of the processor may be as shown in FIG. 10.
  • PrimeTag (TB hit, DC PrimeTag hit, valid) Then it is determined that the data is in the DC, and the DC is directly accessed to obtain the data, and the related count of the number of times that the PrimeTag is accessed is updated. If the access is a SubTag (TB, Hit, DC, SubTag, hit, valid), it is determined that the data is in the DC, and the DC is directly accessed to obtain the data, update the SubTag count, and determine whether a second tag information upgrade (SubTag promotion) is required.
  • SubTag promotion Second tag information upgrade
  • a miss occurs when accessing the TB, it is impossible to determine whether the data is in the DC at this time, because the TB miss only represents that the Tag Array in the DC is not in the TB at this time.
  • a DC only tag hit occurs. At this time, it is necessary to continue to determine whether the PrimeTag hits. If a DC primaryTag hit (TB miss, DC PrimeTag hit, not valid), access the off-chip DRAM to obtain data, and update the valid bits in the DC and the Tag and valid bits in the TB. If a DC SubTag hit (TB miss, DC SubTag hit, not valid) accesses the off-chip DRAM to obtain data, and decides whether to perform a SubTag promotion.
  • a miss occurs when accessing the TB, and the DC is also a real miss (TB miss, DC real miss), which meets the original PrimeTag replacement standard, a PrimeTag replacement occurs.
  • the process of upgrading the second tag information can be as shown in FIG. 11.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

An on-chip cache and an integrated chip, used for solving the problems in the prior art of low page storage space utilisation and wasted on-chip cache storage space. The on-chip cache comprises: a storage unit, used for storing a first page, the first page comprising a first portion data unit and a second portion data unit, first tag information corresponding to the first portion data unit having a first priority, second tag information corresponding to the second portion data unit having a second priority, the second priority being lower than the first priority.

Description

一种片内缓存及集成芯片On-chip cache and integrated chip 技术领域Technical field
本申请涉及芯片技术领域,尤其涉及一种片内缓存及集成芯片。The present application relates to the field of chip technology, and in particular, to an on-chip cache and an integrated chip.
背景技术Background technique
当前随着制造工艺和技术的发展,处理器片上缓存容量越来越大,片上缓存的实现介质也越来越多样化,例如静态随机存取存储器(static random-access memory,SRAM)和增强动态随机存取存储器(enhanced dynamic random access memory,eDRAM)实现的缓存可以达到128MB甚至更大,3D封装的SRAM可以进一步增大普通SRAM的存储密度从而达到更大存储容量和访问带宽。2.5D或者3D封装的DRAM(即片上内存)作为片上缓存容量可以达到16GB,并有望进一步提升。缓存容量越大,其管理效率对于处理器性能的影响越大。本申请实施例中,片上内存和片上缓存都指代2.5D或3D封装的DRAM。At present, with the development of manufacturing processes and technologies, the on-chip cache capacity of processors is getting larger and larger, and the implementation media of on-chip caches are becoming more and more diverse, such as static random-access memory (SRAM) and enhanced dynamics. The random access memory (enhanced dynamic random access memory, eDRAM) can achieve a cache of 128MB or more, and the 3D packaged SRAM can further increase the storage density of ordinary SRAM to achieve greater storage capacity and access bandwidth. 2.5D or 3D packaged DRAM (ie, on-chip memory) as on-chip cache capacity can reach 16GB, and is expected to further increase. The larger the cache capacity, the greater the impact of its management efficiency on processor performance. In the embodiments of the present application, both the on-chip memory and the on-chip cache refer to a 2.5D or 3D packaged DRAM.
具体地,内存系统的访问带宽问题成为阻碍处理器性能提升的主要原因之一。片上大容量缓存是业界为解决内存系统的访问带宽问题所提出的一种新技术。片上大容量缓存可以通过硅通孔(through silicon via,TSV)技术把内存裸片封装到片上,从而达到提升内存系统访问带宽的效果。以2.5D或者3D封装的DRAM为例,有数据表明,片上内存的带宽可以达到片外双倍速率动态随机存储器(double data rate DRAM,DDR DRAM)的4~8倍。Specifically, the access bandwidth problem of the memory system has become one of the main reasons hindering the improvement of processor performance. On-chip large-capacity cache is a new technology proposed by the industry to solve the access bandwidth problem of memory systems. The large-capacity on-chip cache can package memory dies on-chip through through silicon via (TSV) technology, thereby achieving the effect of improving the access bandwidth of the memory system. Taking 2.5D or 3D packaged DRAM as an example, some data show that the bandwidth of on-chip memory can reach 4 to 8 times of the off-chip double-rate dynamic random access memory (DDR) DRAM (DDR DRAM).
通常,片上缓存可以有三种工作模式:缓存(cache)模式、平坦模式和混合模式。工作在cache模式时,片上缓存可以作为片外内存的缓存;工作在平坦模式时,片上缓存作为普通内存使用;工作在混合模式时,部分片上缓存作为片外内存的缓存,部分片上缓存作为普通内存。本申请针对片上缓存作为片外内存的缓存这一场景进行改进。Generally, on-chip caches can work in three modes: cache mode, flat mode, and mixed mode. When working in cache mode, the on-chip cache can be used as a cache for off-chip memory; when working in flat mode, the on-chip cache is used as ordinary memory; when working in mixed mode, some on-chip caches are used as off-chip memory caches, and some on-chip caches are used as ordinary RAM. This application improves the scenario where the on-chip cache is used as the cache of off-chip memory.
片上缓存作为片外DRAM的缓存时,足迹缓存(Footprint Cache)的概念被提出,其设计思路是:以页面为粒度进行Cache分配,但是按需只把footprint粒度大小的数据存储至片上缓存,其余不被访问的数据不进入片上缓存以节约访问片外内存的带宽。如图1所示,为片上缓存中存储的一个页面的示意图。在图1中,每个小方框代表一个footprint(例如可以是一个block),灰色填充的footprint中存储有数据,白色填充的footprint是空白的,不存储数据。即灰色填充的footprint中可以存储有一个页面中经常被访问的数据,而页面中白色填充部分对应的数据被访问频率较低,因而未存储在片上缓存中。When the on-chip cache is used as the cache of off-chip DRAM, the concept of Footprint cache is proposed. The design idea is: Cache allocation is based on page granularity, but only the footprint granularity data is stored in the on-chip cache as needed. Data that is not accessed does not enter the on-chip cache to save bandwidth for accessing off-chip memory. As shown in FIG. 1, it is a schematic diagram of a page stored in an on-chip cache. In FIG. 1, each small box represents a footprint (for example, it can be a block). The gray-filled footprint stores data, and the white-filled footprint is blank, and no data is stored. That is, the gray-filled footprint can store frequently accessed data on a page, and the data corresponding to the white-filled portion of the page is accessed less frequently, so it is not stored in the on-chip cache.
采用图1所示的方案,虽然可以提升内存系统的访问效率,减少不必要的访存带宽浪费,但是页面存储空间利用率不高,造成片上缓存存储空间的浪费。With the solution shown in FIG. 1, although the access efficiency of the memory system can be improved and unnecessary waste of memory bandwidth is reduced, the utilization of page storage space is not high, resulting in waste of on-chip cache storage space.
发明内容Summary of the invention
本申请实施例提供了一种片内缓存及集成芯片,用于解决现有技术中存在的页面存储空间利用率不高、片上缓存存储空间浪费的问题。The embodiments of the present application provide an on-chip cache and an integrated chip, which are used to solve the problems of low utilization of page storage space and waste of on-chip cache storage space in the prior art.
第一方面,本申请实施例提供一种片内缓存,包括:存储单元,用于存储第一页面,第一页面包括第一部分数据单元和第二部分数据单元,第一部分数据单元对应的第一标签信息具有第一优先级,第二部分数据单元对应的第二标签信息具有第二优先级,第二优先 级低于第一优先级。In a first aspect, an embodiment of the present application provides an on-chip cache, including: a storage unit configured to store a first page. The first page includes a first part of the data unit and a second part of the data unit. The tag information has a first priority, and the second tag information corresponding to the second part of the data unit has a second priority, and the second priority is lower than the first priority.
采用第一方面提供的片内缓存,在第一页面中存储的数据单元可以划分为两个部分,每一部分数据单元对应一个标签信息。在片外内存中,一个标签信息对应片外内存中的一个页面,也就是说,在第一方面提供的片内缓存中,每一部分数据单元用于存储片外内存中一个页面的数据单元,第一方面提供的片内缓存可以将原本在片外内存中存储在两个页面中的数据单元存储至第一页面中。在现有技术中,片内缓存中一个页面中存储的数据单元均来自片外内存中的同一个页面,即片内缓存中存储的数据单元均对应一个标签信息,而在第一方面提供的片内缓存的第一页面中存储有两个标签信息对应的数据单元。也就是说,第一方面提供的片内缓存可以将原本存储在两个页面中的数据压缩存储至一个页面,从而节省片内缓存的存储空间。With the on-chip cache provided in the first aspect, the data unit stored in the first page can be divided into two parts, and each part of the data unit corresponds to one tag information. In the off-chip memory, one tag information corresponds to one page in the off-chip memory, that is, in the on-chip cache provided in the first aspect, each part of the data unit is used to store a data unit of one page in the off-chip memory. The on-chip cache provided in the first aspect can store data units that were originally stored in two pages in off-chip memory to the first page. In the prior art, the data units stored in one page in the on-chip cache are all from the same page in off-chip memory, that is, the data units stored in the on-chip cache correspond to one tag information, and the data provided in the first aspect The first page of the on-chip cache stores two data units corresponding to the tag information. That is to say, the on-chip cache provided in the first aspect can compress and store data originally stored in two pages to one page, thereby saving the storage space of the on-chip cache.
在一种可能的设计中,该存储单元还用于:存储第一页面的索引信息,第一页面的索引信息包括第一部分数据单元的索引信息以及第二部分数据单元的索引信息。In a possible design, the storage unit is further configured to store index information of the first page, where the index information of the first page includes index information of the first part of the data unit and index information of the second part of the data unit.
采用上述方案,可以根据第一页面的索引信息索引第一页面中存储的数据单元。With the above solution, the data unit stored in the first page can be indexed according to the index information of the first page.
具体地,第一部分数据单元的索引信息可以包括第一标签信息和第一有效位信息,第二部分数据单元的索引信息可以包括第二标签信息和第二有效位信息。Specifically, the index information of the first part of the data unit may include first tag information and first valid bit information, and the index information of the second part of the data unit may include second tag information and second valid bit information.
采用上述方案,可以通过第一部分数据单元的索引信息确定第一部分数据单元的优先级以及第一页面中有效的数据单元;可以通过第二部分数据单元的索引信息确定第二部分数据单元的优先级以及第一页面中有效的数据单元。With the above solution, the priority of the first part of the data unit and the valid data unit in the first page can be determined by the index information of the first part of the data unit; the priority of the second part of the data unit can be determined by the index information of the second part of the data unit And valid data units on the first page.
在一种可能的设计中,片内缓存中还可以包括:存储控制器,用于将第二标签信息的优先级置为第一优先级,并将第一标签信息或具有第一优先级的第三标签信息的优先级置为第二优先级。In a possible design, the on-chip cache may further include: a storage controller, configured to set a priority of the second tag information to a first priority, and set the first tag information or a The priority of the third tag information is set to the second priority.
采用上述方案,可以将第一部分数据单元和第二部分数据单元的优先级对掉。With the above solution, the priorities of the first part of the data unit and the second part of the data unit can be reversed.
在一种可能的设计中,第一页面还包括第三部分数据单元,第三部分数据单元对应的第四标签信息具有第三优先级,第三优先级低于第二优先级。In a possible design, the first page further includes a third part of the data unit, and the fourth tag information corresponding to the third part of the data unit has a third priority, and the third priority is lower than the second priority.
采用上述方案,片内缓存中可以存储多个部分存储单元,每个部分存储单元的优先级不同。With the above solution, multiple partial storage units can be stored in the on-chip cache, and each partial storage unit has a different priority.
第二方面,本申请实施例提供一种集成芯片,包括:片内缓存,用于存储第一页面,第一页面包括第一部分数据单元和第二部分数据单元,第一部分数据单元对应的第一标签信息具有第一优先级,第二部分数据单元对应的第二标签信息具有第二优先级,第二优先级低于第一优先级。In a second aspect, an embodiment of the present application provides an integrated chip, including: an on-chip cache for storing a first page, where the first page includes a first part data unit and a second part data unit, and a first part corresponding to the first part data unit The tag information has a first priority, and the second tag information corresponding to the second part of the data unit has a second priority, and the second priority is lower than the first priority.
采用第二方面提供的集成芯片,可以在第一页面中存储第一标签信息对应的第一部分数据单元以及第二标签信息对应的第二部分数据单元。与现有技术中在一个页面仅存储一个标签信息对应的数据单元的方案相比,采用第二方面提供的集成芯片,可以将片内缓存中国原本存储在多个页面中的数据压缩存储至一个页面,从而节省片内缓存的存储空间。With the integrated chip provided in the second aspect, the first part of the data unit corresponding to the first tag information and the second part of the data unit corresponding to the second tag information can be stored in the first page. Compared with the prior art solution of storing only one data unit corresponding to one tag information on one page, the integrated chip provided in the second aspect can be used to compress and store the data that was originally stored in multiple pages in the on-chip cache China to one page. Pages, thereby saving on-chip cache storage space.
在一种可能的设计中,片内缓存还用于:存储第一页面的索引信息,第一页面的索引信息包括第一部分数据单元的索引信息以及第二部分数据单元的索引信息。In a possible design, the on-chip cache is further configured to store index information of the first page, where the index information of the first page includes index information of the first part of the data unit and index information of the second part of the data unit.
采用上述方案,可以根据第一页面的索引信息索引第一页面中存储的数据单元。With the above solution, the data unit stored in the first page can be indexed according to the index information of the first page.
具体地,第一部分数据单元的索引信息包括第一标签信息和第一有效位信息,第二部分数据单元的索引信息包括第二标签信息和第二有效位信息。Specifically, the index information of the first part of the data unit includes first tag information and first valid bit information, and the index information of the second part of the data unit includes second tag information and second valid bit information.
采用上述方案,可以通过第一部分数据单元的索引信息确定第一部分数据单元的优先 级以及第一页面中有效的数据单元;可以通过第二部分数据单元的索引信息确定第二部分数据单元的优先级以及第一页面中有效的数据单元。With the above solution, the priority of the first part of the data unit and the valid data unit in the first page can be determined by the index information of the first part of the data unit; the priority of the second part of the data unit can be determined by the index information of the second part of the data unit And valid data units on the first page.
在一种可能的设计中,第二方面提供的集成芯片中还包括:处理器,用于发送第一访问指令,第一访问指令用于请求访问第一数据单元,第一数据单元对应第一标签信息;片内缓存还用于:根据第一有效位信息确定第一数据单元存储在片内缓存中;将第一数据单元发送给处理器。In a possible design, the integrated chip provided in the second aspect further includes: a processor for sending a first access instruction, the first access instruction for requesting access to the first data unit, and the first data unit corresponds to the first The tag information; the on-chip cache is further configured to: determine that the first data unit is stored in the on-chip cache according to the first significant bit information; and send the first data unit to the processor.
采用上述方案,当处理器请求访问第一标签信息对应的数据单元时,若该数据单元存储在片内缓存中,则直接将该数据单元返回给处理器,而不必从片外内存中获取,从而提高数据访问效率。With the above solution, when the processor requests to access the data unit corresponding to the first tag information, if the data unit is stored in the on-chip cache, the data unit is directly returned to the processor without having to obtain it from off-chip memory. This improves data access efficiency.
在一种可能的设计中,处理器还用于:发送第二访问指令,第二访问指令用于请求访问第二数据单元,第二数据单元对应第二标签信息;片内缓存还用于:根据第二有效位信息确定第二数据单元存储在片内缓存中;将第二数据单元发送给处理器。In a possible design, the processor is further configured to send a second access instruction, the second access instruction is used to request access to the second data unit, and the second data unit corresponds to the second tag information; the on-chip cache is further configured to: Determine that the second data unit is stored in the on-chip buffer according to the second significant bit information; and send the second data unit to the processor.
采用上述方案,当处理器请求访问第二标签信息对应的数据单元时,若该数据单元存储在片内缓存中,则直接将该数据单元返回给处理器,而不必从片外内存中获取,从而提高数据访问效率。With the above solution, when the processor requests to access the data unit corresponding to the second tag information, if the data unit is stored in the on-chip cache, the data unit is directly returned to the processor without having to obtain it from off-chip memory. This improves data access efficiency.
在一种可能的设计中,片内缓存还用于:根据第二数据单元在单位时间内的被访问次数确定是否将第二标签信息的优先级置为第一优先级,并将第一标签信息或具有第一优先级的第三标签信息的优先级置为第二优先级。In a possible design, the on-chip cache is further configured to determine whether to set the priority of the second tag information to the first priority according to the number of times the second data unit is accessed in a unit time, and set the first tag The priority of the information or the third tag information having the first priority is set to the second priority.
采用上述方案,可以将第二标签信息和第一标签信息的优先级进行替换。With the above solution, the priorities of the second label information and the first label information can be replaced.
在一种可能的设计中,处理器还用于:发送第三访问指令,第三访问指令用于请求访问第三数据单元,第三数据单元对应第二标签信息;片内缓存还用于:根据第二有效位信息确定第三数据单元未存储在片内缓存中;处理器还用于:从片外内存中读取第三数据单元;片内缓存还用于:存储第三数据单元。In a possible design, the processor is further configured to: send a third access instruction, the third access instruction is used to request access to the third data unit, and the third data unit corresponds to the second tag information; the on-chip cache is further configured to: It is determined that the third data unit is not stored in the on-chip cache according to the second significant bit information; the processor is further configured to: read the third data unit from the off-chip memory; the on-chip cache is further configured to: store the third data unit.
采用上述方案,当处理器请求访问第二标签信息对应的数据单元时,若该数据单元未存储在片内缓存中,则从片外内存中读取相应数据单元,片内缓存将该数据单元存储。With the above solution, when the processor requests to access the data unit corresponding to the second tag information, if the data unit is not stored in the on-chip cache, the corresponding data unit is read from the off-chip memory, and the on-chip cache stores the data unit storage.
具体地,片内缓存在存储第三数据单元时,具体用于:将第三数据单元存储在第一页面或第二页面中。Specifically, when the on-chip cache stores the third data unit, it is specifically configured to store the third data unit in the first page or the second page.
其中,第二页面和第一页面是在Data Array中的同一Way上的数据单元。Among them, the second page and the first page are data units on the same Way in the Data Array.
进一步地,片内缓存还用于:将第二有效位信息中与第三数据单元对应的有效位置为有效;将第二标签信息的优先级置为第一优先级,并将第一标签信息或具有第一优先级的第四标签信息的优先级置为第二优先级。Further, the on-chip cache is further configured to: set the valid position corresponding to the third data unit in the second significant bit information to be valid; set the priority of the second tag information to the first priority, and set the first tag information Or the priority of the fourth tag information having the first priority is set to the second priority.
第二标签信息可以视为第一标签信息的victim角色,当第二标签信息对应的数据单元未存储在片内缓存的情况下,且该数据单元的相应存储位置被第一标签信息占用,则将相应数据读取到片内缓存中时,具有第二优先级的第二标签信息需要升级为第一优先级。这时候,必然要将Tag Array中的某一具有第一优先级的标签信息降级为第二优先级,这个降级的标签信息可以是第一标签信息,也可以是第四标签信息。The second tag information can be regarded as the victim role of the first tag information. When the data unit corresponding to the second tag information is not stored in the on-chip cache, and the corresponding storage location of the data unit is occupied by the first tag information, then When the corresponding data is read into the on-chip cache, the second tag information having the second priority needs to be upgraded to the first priority. At this time, one of the tag information having the first priority in the Tag Array must be downgraded to the second priority. This downgraded tag information may be the first tag information or the fourth tag information.
此外,第二方面提供的集成芯片中还可以包括:标签缓存器,用于存储第一标签信息和第二标签信息。In addition, the integrated chip provided in the second aspect may further include: a tag buffer, configured to store the first tag information and the second tag information.
采用上述方案,当处理器发出访问指令后,可以先在标签缓存器中查找相应信息,以确定处理器访问的数据是否存储在片内缓存中,从而提高数据访问效率。With the above solution, when the processor issues an access instruction, it can first look up the corresponding information in the tag buffer to determine whether the data accessed by the processor is stored in the on-chip cache, thereby improving data access efficiency.
在一种可能的设计中,第一页面还包括第三部分数据单元,第三部分数据单元对应的第四标签信息具有第三优先级,第三优先级低于第二优先级。In a possible design, the first page further includes a third part of the data unit, and the fourth tag information corresponding to the third part of the data unit has a third priority, and the third priority is lower than the second priority.
采用上述方案,片内缓存中可以存储多个部分存储单元,每个部分存储单元的优先级不同。With the above solution, multiple partial storage units can be stored in the on-chip cache, and each partial storage unit has a different priority.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为现有技术提供的一种片上缓存存储的页面的示意图;FIG. 1 is a schematic diagram of an on-chip cache storage page provided by the prior art; FIG.
图2为本申请实施例提供的第一种集成芯片的结构示意图;2 is a schematic structural diagram of a first integrated chip according to an embodiment of the present application;
图3为本申请实施例提供的一种片内缓存中的页面的示意图;3 is a schematic diagram of a page in an on-chip cache provided by an embodiment of the present application;
图4为本申请实施例提供的一种标签阵列和数据阵列的示意图;FIG. 4 is a schematic diagram of a label array and a data array according to an embodiment of the present application; FIG.
图5为本申请实施例提供的第二种集成芯片的结构示意图;5 is a schematic structural diagram of a second integrated chip according to an embodiment of the present application;
图6为本申请实施例提供的一种片内缓存的结构示意图;6 is a schematic structural diagram of an on-chip cache according to an embodiment of the present application;
图7为本申请实施例提供的第三种集成芯片的结构示意图;7 is a schematic structural diagram of a third integrated chip according to an embodiment of the present application;
图8为本申请实施例提供的第一种优先级替换过程的流程示意图;8 is a schematic flowchart of a first priority replacement process according to an embodiment of the present application;
图9为本申请实施例提供的第二种优先级替换过程的流程示意图;9 is a schematic flowchart of a second priority replacement process according to an embodiment of the present application;
图10为本申请实施例提供的一种数据访问流程示意图;10 is a schematic diagram of a data access process according to an embodiment of the present application;
图11为本申请实施例提供的第三种优先级替换过程的流程示意图。FIG. 11 is a schematic flowchart of a third priority replacement process according to an embodiment of the present application.
具体实施方式detailed description
下面,首先对本申请实施例的应用场景加以介绍。In the following, the application scenarios of the embodiments of the present application are first introduced.
本申请实施例可以应用于图2所示的集成芯片中。该集成芯片中包括处理器、内存控制器(memory controller,MC)和片内缓存。此外,该集成芯片还与片外内存连接。其中,处理器用于发起数据访问请求以及进行数据处理;内存控制器用于控制处理器与片内缓存之间、处理器与片外内存之间的数据交互;片外内存存储有大量数据,片内缓存可以视为片外内存在片内的缓存。The embodiment of the present application can be applied to the integrated chip shown in FIG. 2. The integrated chip includes a processor, a memory controller (MC) and an on-chip cache. In addition, the integrated chip is also connected to off-chip memory. Among them, the processor is used to initiate data access requests and perform data processing; the memory controller is used to control the data interaction between the processor and the on-chip cache, and between the processor and off-chip memory; the off-chip memory stores a large amount of data, and the on-chip memory The cache can be regarded as an on-chip cache outside the chip.
本申请实施例中,片内缓存作为片外内存的缓存使用。也就是说,片内缓存中存储有片外内存中的部分数据。当处理器发出访问指令后,若处理器访问的数据存储在片内缓存中,则从片内缓存中直接返回数据;若处理器访问的数据未存储在片内缓存中,则需从片外内存取回数据,同时为了下一次访问命中把所取回数据存储至片内缓存中。In the embodiment of the present application, the on-chip cache is used as a cache of off-chip memory. In other words, the on-chip cache stores some data in off-chip memory. After the processor issues an access instruction, if the data accessed by the processor is stored in the on-chip cache, the data is directly returned from the on-chip cache; if the data accessed by the processor is not stored in the on-chip cache, it needs to be accessed from off-chip The data is accessed internally, and the retrieved data is stored in the on-chip cache for the next access hit.
在片内缓存中,缓存空间是以页面为粒度进行Cache分配的,一个页面可以划分为多个数据单元,在本申请实施例中,片内缓存中可以存储有该页面中之前已经被处理器访问过的数据单元,以供处理器后续再次访问;也就是说,片内缓存中可能并未缓存有页面中的全部数据单元。In the on-chip cache, the cache space is allocated by the cache for pages. A page can be divided into multiple data units. In the embodiment of the present application, the on-chip cache can store the page that has been previously processed by the processor. The accessed data unit is for subsequent re-access by the processor; that is, not all data units in the page may be cached in the on-chip cache.
具体地,在片内缓存的一个页面中存储的数据单元可以划分为多个部分,每一部分数据单元对应一个标签信息(Tag)。其中,每一部分数据单元用于存储片外内存中一个页面的数据单元。也就是说,片内缓存可以将原本在片外内存中存储在多个页面中的数据单元存储至片内缓存的一个页面中。以片内缓存的一个页面中存储的数据单元划分为两个部分为例,片内缓存中的一个页面的示意图可以如图3所示。在图3中,片内缓存的一个页面中存储有第一标签信息(Tag1)对应的数据单元以及第二标签信息(Tag2)对应的数据单 元。Tag1对应的数据单元在片外内存中存储在一个页面中,Tag2对应的数据单元在片外内存中存储在另一个页面中。在现有技术中,片内缓存中一个页面中存储的数据单元均来自片外内存中的同一个页面,即片内缓存中存储的数据单元均对应一个Tag,而在本申请实施例中,片内缓存的一个页面中存储有多个Tag对应的数据单元。也就是说,本申请实施例中,可以将原本在片外内存中存储在多个页面中的数据压缩存储至片内缓存的一个页面中,从而节省片内缓存的存储空间。Specifically, the data unit stored in one page of the on-chip cache may be divided into multiple parts, and each part of the data unit corresponds to one tag information (Tag). Among them, each part of the data unit is used to store a data unit of a page in the off-chip memory. In other words, the on-chip cache can store data units that were originally stored in multiple pages in off-chip memory to one page of the on-chip cache. Taking the data unit stored in one page of the on-chip cache as an example, the schematic diagram of one page in the on-chip cache can be shown in FIG. 3. In FIG. 3, one page of the on-chip cache stores a data unit corresponding to the first tag information (Tag1) and a data unit corresponding to the second tag information (Tag2). The data unit corresponding to Tag1 is stored in one page in off-chip memory, and the data unit corresponding to Tag2 is stored in another page in off-chip memory. In the prior art, the data units stored in a page in the on-chip cache are all from the same page in the off-chip memory, that is, the data units stored in the on-chip cache correspond to a tag. In the embodiment of the present application, A page in the on-chip cache stores data units corresponding to multiple tags. That is, in the embodiment of the present application, data originally stored in multiple pages in the off-chip memory may be compressed and stored in one page of the on-chip cache, thereby saving the storage space of the on-chip cache.
在片内缓存中,除了存储有页面之外,还存储有页面的索引信息。具体地,页面的索引信息可以存储在标签阵列(Tag Array)中,页面可以存储在数据阵列(Data Array)中。标签阵列(Tag Array)中的页面的索引信息与数据阵列(Data Array)中的页面是一一对应的。顾名思义,标签阵列(Tag Array)和数据阵列(Data Array)是以阵列的形式存储的。以标签阵列为例,每个标签阵列可以是一个m*n的阵列,阵列中的每个元素均为页面的索引信息。具体地,可以将m*n阵列中的行向量称为Set,将m*n阵列中的列向量称为Way。此外,本申请实施例中,若每个行向量中包括N个元素,则可以将这种存储结构称为N路组相连。例如,片内缓存可以采用四路组相连的存储结构,即每个行向量包含四个元素,在标签阵列中,每个行向量包含的四个元素中的中的一个元素表示一个页面的索引信息,在数据阵列中,每个行向量包含的四个元素中的中的一个元素表示一个页面。In the on-chip cache, in addition to pages, index information of the pages is also stored. Specifically, the index information of the page can be stored in a tag array, and the page can be stored in a data array. The index information of the pages in the tag array corresponds to the pages in the data array. As the name suggests, the Tag Array and Data Array are stored in the form of an array. Take the label array as an example, each label array can be an m * n array, and each element in the array is the index information of the page. Specifically, the row vectors in the m * n array can be called Set, and the column vectors in the m * n array can be called Way. In addition, in the embodiment of the present application, if each row vector includes N elements, this storage structure may be referred to as an N-way group connection. For example, the on-chip cache can use a four-way set of connected storage structures, that is, each row vector contains four elements, and in the label array, one of the four elements contained in each row vector represents the index of a page Information. In the data array, one of the four elements contained in each row vector represents a page.
标签阵列(Tag Array)和数据阵列(Data Array)的一种具体示例可以如图4所示。从图4中可以看出,Tag Array中的元素与Data Array中的元素是一一对应的,Tag Array中的元素用于指示Data Array中对应页面的索引信息。在图4的示例中,片内缓存采用的是四路组相连的方式,实际应用中,片内缓存也可以采用八路组相连等其他架构,本申请实施例对此不作限定。A specific example of a tag array and a data array can be shown in FIG. 4. As can be seen from Figure 4, the elements in the TagArray correspond to the elements in the DataArray one-to-one, and the elements in the TagArray are used to indicate the index information of the corresponding pages in the DataArray. In the example in FIG. 4, the on-chip cache adopts a four-way group connection method. In practical applications, the on-chip cache can also use an eight-way group connection method and other architectures, which are not limited in the embodiment of the present application.
下面对标签阵列中存储的页面的索引信息进行介绍。如前所述,在本申请实施例中,片内缓存的一个页面中存储有片外内存的多个页面中的数据单元,即片内缓存的一个页面中存储的数据单元可以划分为多个部分。其中每一部分数据单元均对应一组索引信息,多组索引信息组成片内缓存的一个页面的索引信息。例如,片内缓存中存储的数据单元划分为A和B两个部分,那么页面的索引信息则包括两组索引信息,一组索引信息用于索引A部分,另一组索引信息用于索引B部分。具体地,每组索引信息可以包括前述标签信息(Tag)。除了标签信息(Tag)之外,还可以包括如下信息:整体有效位信息(Valid),用于指示整个Tag是否有效,若Tag无效,则Tag对应的所有数据单元不可访问;最近最少使用(least recently used,LRU)信息,用于指示最近最少使用的数据单元;脏污位(Dirty Bits)信息,用于指示存储在片内缓存中的数据单元的数据是否为脏数据,若Dirty Bits某些位被设置为0,指示相应数据单元为干净数据,当发生替换时可以不写回片外内存直接无效,反之如果Dirty Bits某些位被设置为1,当发生替换时需要把相应脏数据写回片外内存。有效位(Valid Bits)信息,用于指示页面中有效的数据单元,即页面中有效的数据单元中存储有数据,无效的数据单元中未存储有数据,例如,在图3的示例中,空白格对应的有效位信息即为无效,其中未存储有数据。通过查找Valid Bits信息,可以判断处理器访问的数据单元是否存储在片内缓存中。例如,处理器发起请求访问Tag1对应的某一数据单元的访问指令,通过查找片内缓存中的Tag Array,找到了Tag1这一组索引信息(可以称为Tag1命中),此时可以进一步判断处理器请求访问的数据单元对应的有效位信息,若有效,则说明该数据单元存储在片内缓存,若无效,则说明该数据单元未存储在片内缓存,需要去片 外内存访问。当然,在确定该数据单元是否存储在片内缓存中以及是否可以被访问时,还要考虑前述整体有效位信息(Valid)和脏污位(Dirty Bits)信息等信息,此示例仅为了对Valid Bits信息的使用做介绍,对于其他信息此处不再赘述。The following describes the index information of the pages stored in the label array. As mentioned above, in the embodiment of the present application, one page of the on-chip cache stores data units in multiple pages of off-chip memory, that is, the data unit stored in one page of the on-chip cache can be divided into multiple section. Each part of the data unit corresponds to a set of index information, and multiple sets of index information constitute the index information of a page cached in the slice. For example, the data unit stored in the on-chip cache is divided into two parts, A and B, then the index information of the page includes two sets of index information, one set of index information is used for index A, and the other set of index information is used for index B. section. Specifically, each set of index information may include the foregoing tag information (Tag). In addition to the tag information (Tag), it can also include the following information: Overall valid bit information (Valid), which is used to indicate whether the entire tag is valid. If the tag is invalid, all data units corresponding to the tag are not accessible; least recently used (least Recently used (LRU) information is used to indicate the least recently used data unit; Dirty Bits information is used to indicate whether the data of the data unit stored in the on-chip cache is dirty. If Dirty Bits is some The bit is set to 0, indicating that the corresponding data unit is clean data. When replacement occurs, it can be invalid without writing back to off-chip memory. Otherwise, if some bits of Dirty Bits are set to 1, the corresponding dirty data needs to be written when replacement occurs. Back to off-chip memory. Valid bits information is used to indicate valid data units in the page, that is, data is stored in valid data units in the page, and data is not stored in invalid data units. For example, in the example in FIG. 3, blank The valid bit information corresponding to the grid is invalid, and no data is stored in it. By looking up the ValidBits information, you can determine whether the data unit accessed by the processor is stored in the on-chip cache. For example, the processor initiates an access instruction requesting access to a certain data unit corresponding to Tag1. By searching the TagArray in the on-chip cache, the index information of Tag1 (which can be called a Tag1 hit) can be found. At this time, further judgment and processing can be performed. The valid bit information corresponding to the data unit that the device requests to access. If it is valid, it means that the data unit is stored in the on-chip cache. If it is invalid, it means that the data unit is not stored in the on-chip cache and needs to be accessed off-chip memory. Of course, in determining whether the data unit is stored in the on-chip cache and whether it can be accessed, the foregoing information such as the overall valid bit information (Valid) and dirty bit (Dirty Bits) information must also be considered. This example is only for the Valid The use of Bits information is introduced, and other details will not be repeated here.
需要注意的是,在每组索引信息中,对应的有效位信息不能同时有效。这是因为,本申请实施例中将片外内存中多个页面中的数据单元存储至片内缓存的一个页面中,而片外内存的一个页面与片内缓存的一个页面的存储空间大小相同,那么对于片内缓存的一个页面中用于存储一个数据单元的存储位置来说,若该存储位置用来存储Tag1对应的数据单元(在Tag1这一组索引信息中,该存储位置对应的Valid Bits有效),那么该存储位置就不可能用来存储Tag2对应的数据单元(在Tag2这一组索引信息中,该存储位置对应的Valid Bits无效)。以图3所示的一个页面为例,第二行第一列的存储位置用来存储Tag1对应的数据单元,那么该存储位置则不能同时存储Tag2对应的数据单元。反映到Valid Bits上,Tag1对应的Valid Bits中的第五位为有效,Tag2对应的Valid Bits中的第五位无效。It should be noted that, in each set of index information, the corresponding significant bit information cannot be valid at the same time. This is because, in the embodiment of the present application, data units in multiple pages in off-chip memory are stored in one page in the on-chip cache, and a page in the off-chip memory has the same storage space as a page in the on-chip cache. , Then for a storage location for storing a data unit in a page of the on-chip cache, if the storage location is used to store a data unit corresponding to Tag1 (in the index information of Tag1, the storage location corresponds to Valid Bits is valid), then this storage location cannot be used to store the data unit corresponding to Tag2 (in the index information of Tag2, the Valid Bits corresponding to this storage location is invalid). Taking a page shown in FIG. 3 as an example, the storage position of the second row and the first column is used to store the data unit corresponding to Tag1, and then the storage location cannot simultaneously store the data unit corresponding to Tag2. Reflected in ValidBits, the fifth bit in ValidBits corresponding to Tag1 is valid, and the fifth bit in ValidBits corresponding to Tag2 is invalid.
此外,每组索引信息还可以包括访问位信息(Reference Bits),用于记录哪些数据单元被访问过的历史信息,当Tag对应的数据单元不再存储在片内缓存、需要写回片外内存时,需要把访问位信息也写回片外内存存储,这样的话,当Tag对应的数据单元再次存入片内缓存时可以通过访问位信息进行数据单元的预取。In addition, each set of index information can also include access bit information (Reference bits), which is used to record the historical information of which data units have been accessed. When the data unit corresponding to the tag is no longer stored in the on-chip cache, it needs to be written back to off-chip memory. At this time, it is necessary to write the access bit information back to the off-chip memory storage. In this case, when the data unit corresponding to the tag is stored in the on-chip cache again, the pre-fetch of the data unit can be performed through the access bit information.
前面提到过,由于本申请实施例中一个页面中存储有多个Tag对应的数据单元,因此,对应页面中的每一个Tag,都存在一组上述包括Tag、Valid、LRU、Dirty Bits、Valid Bits和Reference Bits的索引信息。图4中以一个页面中包括两个Tag对应的数据单元为例,则页面的索引信息包括两组。具体地,两个Tag中,优先级较高的Tag称为Prime Tag,优先级较低的Tag称为Sub Tag。页面的索引信息包括Prime Tag对应的索引信息以及Sub Tag对应的索引信息。As mentioned earlier, since a page stores data units corresponding to multiple tags in the embodiment of the present application, for each tag in the corresponding page, there is a set of the above including Tag, Valid, LRU, Dirty Bits, Valid Bits and Reference Bits index information. In FIG. 4, a page includes data units corresponding to two tags as an example, and the index information of the page includes two groups. Specifically, among the two tags, a tag with a higher priority is called a Prime Tag, and a tag with a lower priority is called a Sub Tag. The index information of the page includes index information corresponding to Prime Tag and index information corresponding to Sub Tag.
此外,在图4的示例中,优先级较高的Tag(Prime Tag)对应的数据单元可以优先存储在片内缓存中。如前所述,对于片内缓存的一个页面中用于存储一个数据单元的存储位置来说,该存储位置仅能存储一个Tag对应的数据单元,在判断该存储位置用于存储哪个Tag对应的数据单元时,就涉及到存储的优先权的问题。比如,若Sub Tag对应的某一数据单元想要存储在片内缓存,而此时存储该数据单元的存储位置被Prime Tag对应的数据单元占用,那么Sub Tag对应的数据单元则不可以存储在片内缓存中;比如,若Prime Tag对应的某一数据单元要存储在片内缓存,而此时存储该数据单元的存储位置被Sub Tag对应的数据单元占用,那么Prime Tag对应的数据单元则可以占用该存储位置,将Sub Tag对应的数据单元踢出片内缓存。In addition, in the example of FIG. 4, the data unit corresponding to the tag (Prime tag) with a higher priority can be stored in the on-chip cache preferentially. As mentioned above, for a storage location for storing a data unit in a page of the on-chip cache, the storage location can only store a data unit corresponding to a tag, and in determining which storage location is used to store which tag corresponds to When it comes to data units, the issue of storage priority is involved. For example, if a data unit corresponding to a SubTag wants to be stored in the on-chip cache, and the storage location of the data unit is occupied by a data unit corresponding to a PrimeTag, then the data unit corresponding to the SubTag cannot be stored in In the on-chip cache; for example, if a data unit corresponding to the Prime Tag is to be stored in the on-chip cache, and the storage location of the data unit is occupied by the data unit corresponding to the Sub Tag, then the data unit corresponding to the Prime Tag is This storage location can be occupied, and the data unit corresponding to the SubTag is kicked out of the on-chip cache.
此外,在图2所示的集成芯片中,内存控制器中还可以包括标签缓存(Tag Buffer,TB),用于把片内缓存中存储的Tag Array中的部分数据缓存到TB中(为了节省内存控制器的存储空间,考虑仅把Tag Array的部分数据存储至TB,而未将Tag Array的全部数据存储至TB),从而提高页面索引信息的访问速度。具体地,TB中存储的Tag Array的具体内容可以参见前述片内缓存中存储的Tag Array的相关描述,此处不再赘述。In addition, in the integrated chip shown in FIG. 2, the memory controller may further include a tag buffer (Tag buffer, TB), which is used to cache part of the data in the Tag Array stored in the on-chip cache to the TB (in order to save The storage space of the memory controller considers that only part of the data of the TagArray is stored to the TB, and not all of the data of the TagArray is stored to the TB), thereby improving the access speed of the page index information. Specifically, for the specific content of the Tag Array stored in the TB, please refer to the related description of the Tag Array stored in the on-chip cache, which is not repeated here.
在TB中存储有Tag Array中的部分数据的情况下,处理器在发出访问指令后,可以先通过查找TB确定所访问数据是否存储于片内缓存中。若通过查找TB确定所访问数据存储在片内缓存中,则可以不必再查找片内缓存中的Tag Array,直接从片内缓存的Data Array访问数据即可,从而提高了页面的索引信息的访问速度;若通过查找TB确定所访问数据 未存储于片内缓存中,则存在两种情况,第一种情况是该所访问数据确实未存储在片内缓存中,第二种情况是TB中仅存储片内缓存的Tag Array中的部分数据,而所访问数据对应的索引信息恰好未存储在TB中,此时需要再去片内缓存中的Tag Array中进行查找,判断此次访问属于上述两种情况中的哪一种。In the case where part of the data in the TagArray is stored in the TB, after the processor issues an access instruction, it can first determine whether the accessed data is stored in the on-chip cache by looking up the TB. If the accessed data is determined to be stored in the on-chip cache by looking up the TB, you can access the data directly from the on-chip cached Data Array without searching the Tag Array in the on-chip cache, thereby improving the access to the index information of the page. Speed; if it is determined by searching TB that the accessed data is not stored in the on-chip cache, there are two cases, the first case is that the accessed data is not actually stored in the on-chip cache, and the second case is that the TB is only Store some of the data in the Tag Array cached on-chip, and the index information corresponding to the accessed data is not stored in the TB. At this time, you need to go to the Tag Array in the on-chip cache to look up to determine that the access belongs to the two above. Which of these situations.
值得注意的是,TB中存储的数据可能比片内缓存中存储的数据新,因此当TB中发生数据更新时,需要将更新后的数据写回片内缓存。此外,为了保证TB中的数据是正确的,当片内缓存或片外内存中的页面索引信息发生变化时,TB中的数据必须同步更新。It is worth noting that the data stored in the TB may be newer than the data stored in the on-chip cache, so when a data update occurs in the TB, the updated data needs to be written back to the on-chip cache. In addition, in order to ensure that the data in the TB is correct, when the page index information in the on-chip cache or off-chip memory changes, the data in the TB must be updated synchronously.
与片内缓存中存储的Tag Array类似,TB中存储的Tag Array也是以阵列的形式存储的,例如可以是一个m*n的阵列,阵列中的每个元素均为页面的索引信息。具体地,可以将m*n阵列中的行向量称为Set,将m*n阵列中的列向量称为Way。此外,若每个行向量中包括N个元素,则可以将TB的这种存储结构称为N路组相连。示例性地,TB可以采用四路组相连的存储结构,即每个行向量包含四个元素。Similar to the Tag Array stored in the on-chip cache, the Tag Array stored in the TB is also stored in the form of an array. For example, it can be an m * n array, and each element in the array is the index information of the page. Specifically, the row vectors in the m * n array can be called Set, and the column vectors in the m * n array can be called Way. In addition, if each row vector includes N elements, this storage structure of TB can be referred to as an N-way group connection. Exemplarily, the TB may adopt a four-way group connected storage structure, that is, each row vector contains four elements.
示例性地,本申请实施例提供的集成芯片中,TB和片内缓存中存储的数据可以如图5所示。在图5的示例中,片内缓存和片外内存共用一个内存控制器,实际应用中,片内缓存和片外内存也可以由两个控制器分别控制,本申请对此不作限定。Exemplarily, in the integrated chip provided in the embodiment of the present application, data stored in the TB and the on-chip cache may be as shown in FIG. 5. In the example in FIG. 5, the on-chip cache and the off-chip memory share a memory controller. In practical applications, the on-chip cache and the off-chip memory can also be controlled by two controllers, which are not limited in this application.
此外,在图5所示的TB的存储阵列中,标签信息(Tag)的含义与片内缓存的Tag Array中的Prime Tag或Sub Tag的含义相同,值(Value)表示片内缓存的Tag Array中的Valid、LRU、Valid Bits、Dirty Bits和Reference Bits等信息的集合。需要注意的是,在图5的示例中,为了简化TB的管理,PrimeTag和SubTag的地位从TB看是相等的,因此TB中每一项(Tag+Value)针对的是一个PrimeTag或者一个SubTag,而不是像片内缓存那样同时记录一对PrimeTag和SubTag。因此,片内缓存的Tag Array中的一个元素对应存储在TB中的两组Tag+Value中的,每两组Tag+Value可以视为TB的存储阵列中的一个元素,那么不难看出,在图5的示例中,TB是采用四路组相连的存储结构,即每个行向量包括四个元素。In addition, in the TB storage array shown in FIG. 5, the meaning of the tag information (Tag) is the same as that of the Prime Tag or the Sub Tag in the Tag Array of the on-chip cache, and the value (Value) represents the Tag Array of the on-chip cache Valid, LRU, Valid Bits, Dirty Bits, and Reference Bits. It should be noted that, in the example of Figure 5, in order to simplify the management of TB, the status of PrimeTag and SubTag is equal from TB, so each item (Tag + Value) in TB is targeted to a PrimeTag or a SubTag. Instead of recording a pair of PrimeTag and SubTag at the same time as the on-chip cache. Therefore, one element in the on-chip cached TagArray corresponds to two sets of Tag + Value stored in TB. Each set of Tag + Value can be regarded as one element in the storage array of TB, so it is not difficult to see that In the example in FIG. 5, the TB is a four-way group connected storage structure, that is, each row vector includes four elements.
在图5的示例中,单个缓存Set中包括四个页面索引信息,每个页面索引信息占用64B存储空间,用于存储一个4KB页面的索引信息,片内缓存也是采用四路组相连的存储结构。此外,在页面索引信息后还插入了历史数据信息,用于记录被踢出片内缓存的页面的相关信息(包括标签信息Tag,被访问次数的计数值count以及被访问的历史足迹),以便下次再将该页面存储进片内缓存时,可以直接获取页面的相关信息。In the example of FIG. 5, a single cache set includes four page index information, each page index information occupies 64B of storage space, and is used to store index information of a 4KB page. . In addition, historical data information is inserted after the page index information, which is used to record the relevant information of the page that was kicked out of the on-chip cache (including the tag information Tag, the count value of the number of visits, and the historical footprint of the visit), so that The next time the page is stored in the on-chip cache, the relevant information of the page can be obtained directly.
具体地,图5中页面索引信息所包括的信息与图4示例基本一致,略有不同的是,图5示例中还包括一个角色翻转位。前面提到过,一个页面对应的多个Tag有优先级排序,优先级较高的Tag所对应的数据单元可以优先存储在片内缓存中,在具体的示例中,PrimeTag的优先级高于SubTag的优先级,但是在某些情况下(例如SubTag对应数据的访问频率较高)需要进行优先级的替换,这时为了避免数据转存所带来的开销,可以通过这一角色翻转位来进行指示。例如,角色翻转位为0时(即默认情况下),表示PrimeTag优先级较高;角色翻转位为1时,表示SubTag优先级较高。Specifically, the information included in the page index information in FIG. 5 is basically the same as the example in FIG. 4. The difference is that the example in FIG. 5 further includes a role flip bit. As mentioned earlier, multiple tags corresponding to a page are prioritized. The data unit corresponding to a tag with a higher priority can be stored in the on-chip cache first. In a specific example, the priority of a PrimeTag is higher than that of a SubTag. Priority, but in some cases (for example, the frequency of access to the data corresponding to the SubTag) needs to be replaced by the priority. At this time, in order to avoid the overhead caused by data dump, you can use this role to flip the bit. Instructions. For example, when the role rollover bit is 0 (that is, by default), the PrimeTag has a higher priority; when the role rollover bit is 1, the SubTag has a higher priority.
需要说明的是,在本申请实施例的示例中,均以片内缓存集成在片上(即片内缓存位于集成芯片内部)为例。实际应用中,也可以在集成芯片外设置一个片外缓存,该片外缓存采用本申请实施例提供的片内缓存相同的存储结构设计、且与本申请实施例提供的片内缓存实现相同的功能,该片外缓存也应视为落入本申请实施例的保护范围之内。It should be noted that, in the examples in the embodiments of the present application, the on-chip cache is integrated on the chip (that is, the on-chip cache is located inside the integrated chip) as an example. In practical applications, an off-chip cache may also be provided outside the integrated chip. The off-chip cache adopts the same storage structure design as the on-chip cache provided in the embodiment of the present application and implements the same implementation as the on-chip cache provided in the embodiment of the present application. Function, the off-chip cache should also be regarded as falling within the protection scope of the embodiment of the present application.
此外,在集成芯片中,MC可以作为片内缓存和片外内存的控制器,实现处理器与片内缓存和片外内存之间的数据交互。实际应用中,片内缓存中也可以单独配置有存储控制器,用于对片内缓存中的数据访问进行控制,例如,片内缓存中的存储控制器可以对Tag对应数据单元的被访问情况进行统计,可以判断是否需要进行前述的优先级替换(SubTag提升为PrimeTag,也可以称为SubTag升级)。In addition, in the integrated chip, the MC can be used as the controller of the on-chip cache and off-chip memory to realize the data interaction between the processor and the on-chip cache and off-chip memory. In actual applications, the on-chip cache can also be separately configured with a storage controller to control data access in the on-chip cache. For example, the storage controller in the on-chip cache can access the data unit corresponding to the tag. Statistics can be used to determine whether the foregoing priority replacement is required (SubTag is promoted to PrimeTag, which can also be referred to as SubTag upgrade).
为了解决背景技术中提出的大容量缓存存储空间的浪费的问题,本申请实施例提供一种片内缓存及集成芯片,拟对上述大容量缓存系统进行存储效率优化,进一步提升缓存管理效率,本申请实施例中以2.5D或者3D封装的片上缓存为例进行介绍,此外本申请实施例也适用于其他介质制作的片上缓存(例如背景技术中提到的SRAM、3D-SRAM、eDRAM)。In order to solve the problem of wasted storage space of large-capacity caches proposed in the background, an embodiment of the present application provides an on-chip cache and an integrated chip, which intends to optimize the storage efficiency of the above-mentioned large-capacity cache system to further improve cache management efficiency. In the embodiment of the application, a 2.5D or 3D packaged on-chip cache is taken as an example for introduction. In addition, the embodiment of the present application is also applicable to on-chip caches made of other media (such as SRAM, 3D-SRAM, and eDRAM mentioned in the background art).
下面,结合附图对本申请实施例进行详细介绍。Hereinafter, embodiments of the present application will be described in detail with reference to the drawings.
本申请实施例提供一种片内缓存,如图6所示该片内缓存600包括存储单元601,该存储单元用于存储第一页面,第一页面包括第一部分数据单元和第二部分数据单元,第一部分数据单元对应的第一标签信息具有第一优先级,第二部分数据单元对应的第二标签信息具有第二优先级,第二优先级低于第一优先级。An embodiment of the present application provides an on-chip cache. As shown in FIG. 6, the on-chip cache 600 includes a storage unit 601, which is configured to store a first page. The first page includes a first part of the data unit and a second part of the data unit. The first tag information corresponding to the first part of the data unit has a first priority, the second tag information corresponding to the second part of the data unit has a second priority, and the second priority is lower than the first priority.
不难看出,在第一页面中存储的数据单元可以划分为两个部分,每一部分数据单元对应一个标签信息(Tag)。在片外内存中,一个Tag对应片外内存中的一个页面,也就是说,在片内缓存600中,每一部分数据单元用于存储片外内存中一个页面的数据单元,片内缓存600可以将原本在片外内存中存储在两个页面中的数据单元存储至第一页面中。在现有技术中,片内缓存中一个页面中存储的数据单元均来自片外内存中的同一个页面,即片内缓存中存储的数据单元均对应一个Tag,而在本申请实施例中,片内缓存600的一个页面(即第一页面)中存储有多个Tag对应的数据单元。也就是说,通过图6所示的片内缓存600,可以将原本存储在两个页面中的数据压缩存储至一个页面,从而节省片内缓存的存储空间。It is not difficult to see that the data unit stored in the first page can be divided into two parts, and each part of the data unit corresponds to a tag information (Tag). In off-chip memory, a tag corresponds to a page in off-chip memory, that is, in on-chip cache 600, each part of the data unit is used to store a page of data units in off-chip memory. On-chip cache 600 can The data unit originally stored in the two pages in the off-chip memory is stored in the first page. In the prior art, the data units stored in a page in the on-chip cache are all from the same page in the off-chip memory, that is, the data units stored in the on-chip cache correspond to a tag. In the embodiment of the present application, One page (that is, the first page) of the on-chip cache 600 stores data units corresponding to multiple tags. In other words, the on-chip cache 600 shown in FIG. 6 can compress and store data originally stored in two pages to one page, thereby saving the storage space of the on-chip cache.
示例地,第一页面的一种可能的结构可以如图3所示,其中Tag1可以视为第一标签信息,Tag2可以视为第二标签信息。For example, a possible structure of the first page may be shown in FIG. 3, where Tag1 may be regarded as the first tag information, and Tag2 may be regarded as the second tag information.
在这里,对第一优先级和第二优先级进行简单说明。可以理解的是,在片外内存中,Tag1对应的数据可以占用一整个页面的存储空间,Tag2对应的数据也可以占用一整个页面的存储空间。但是,由于片内缓存的存储空间有限,因而片内缓存中可能仅存储有片外内存中的部分数据(例如经常被访问的数据或者曾经被访问过的数据),那么,Tag1对应的数据单元可能仅有部分存储在第一页面中,Tag2对应的数据单元也是这样。这就会出现一个问题,对于第一页面中的某一个存储位置(例如图3中第一行第一列的存储位置),若片外内存中Tag1对应存储在该位置的数据单元以及Tag2对应存储在该位置的数据单元都想要存储在片内缓存的第一页面中,这时就必须有一个优先级排序,以确定是将Tag1对应的数据单元存储在该位置,就还是将Tag2对应的数据单元存储在该位置。本申请实施例中设定第一标签信息(Tag1)的优先级高于第二标签信息(Tag2)的优先级,那么在遇到上述情况时,优先将Tag1对应的数据单元存储在片内缓存中。Here, the first priority and the second priority are briefly described. It can be understood that in the off-chip memory, the data corresponding to Tag1 can occupy the storage space of an entire page, and the data corresponding to Tag2 can also occupy the storage space of an entire page. However, due to the limited storage space of the on-chip cache, the on-chip cache may store only part of the data in the off-chip memory (such as frequently accessed data or data that has been accessed), then the data unit corresponding to Tag1 It may be only partially stored in the first page, as is the data unit corresponding to Tag2. This will cause a problem. For a certain storage location in the first page (for example, the storage location of the first row and the first column in FIG. 3), if Tag1 in the off-chip memory corresponds to the data unit stored in that location and Tag2 corresponds to The data units stored in this position all want to be stored in the first page of the on-chip cache. At this time, there must be a priority order to determine whether the data unit corresponding to Tag1 is stored in this location, or Tag2 is corresponding. The data unit is stored in this location. In the embodiment of the present application, the priority of the first tag information (Tag1) is set higher than the priority of the second tag information (Tag2). When encountering the above situation, the data unit corresponding to Tag1 is preferentially stored in the on-chip cache. in.
此外,存储单元601还用于存储第一页面的索引信息。具体地,第一页面的索引信息包括第一部分数据单元的索引信息以及第二部分数据单元的索引信息。示例性地,在图4 的示例中,页面的索引信息中前半部分PrimeTag+Valid+LRU+Valid Bits+Dirty Bit+Reference Bits可以视为第一部分数据单元的索引信息,后半部分SubTag+Valid+LRU+Valid Bits+Dirty Bits+Reference Bits可以视为第二部分数据单元的索引信息。In addition, the storage unit 601 is further configured to store index information of the first page. Specifically, the index information of the first page includes index information of the first part of the data unit and index information of the second part of the data unit. Exemplarily, in the example of FIG. 4, the first half of the index information of the page is PrimeTag + Valid + LRU + ValidBits + DirtyBit + ReferenceBits can be regarded as the index information of the first part of the data unit, and the second half of SubTag + Valid + LRU + Valid Bits + Dirty Bits + Reference Bits can be regarded as the index information of the second part of the data unit.
具体地,第一部分数据单元的索引信息可以包括第一标签信息(例如PrimeTag)和第一有效位信息(例如Valid Bits),第二部分数据单元的索引信息可以包括第二标签信息(例如SubTag)和第二有效位信息(例如Valid Bits)。Specifically, the index information of the first part of the data unit may include first tag information (such as PrimeTag) and first valid bit information (for example, Valid bits), and the index information of the second part of the data unit may include second tag information (for example, SubTag) And second significant bit information (for example, Valid Bits).
下面对第一有效位信息和第二有效位信息进行详细介绍。以第一有效位信息为例,第一有效位信息可以用于指示第一页面中有效的数据单元。仍以图4为例,在图4的示例中,第一页面包括4*16个数据单元,那么第一有效位信息可以用64b的数据表示,若对应位置存储有第一部分数据单元,则将第一有效位信息中的相应位域置为有效(例如可以置为1),比如,在第一页面的64个位置中,第一个位置存储的是Tag1对应的数据单元,则将64b的第一有效位信息中的第一位置为有效。第二有效位信息的含义与第一有效位信息的含义类似,此处不再赘述。The first significant bit information and the second significant bit information are described in detail below. Taking the first significant bit information as an example, the first significant bit information may be used to indicate a valid data unit in the first page. Still taking FIG. 4 as an example, in the example of FIG. 4, the first page includes 4 * 16 data units, then the first significant bit information can be represented by 64b data. If the corresponding position stores the first part of the data unit, the The corresponding bit field in the first significant bit information is set to valid (for example, it can be set to 1). For example, in the 64 positions of the first page, the first position stores the data unit corresponding to Tag1, then the 64b The first position in the first significant bit information is valid. The meaning of the second significant bit information is similar to that of the first significant bit information, and details are not described herein again.
此外,片内缓存600中还可以包括存储控制器602,用于将第二标签信息的优先级置为第一优先级,并将第一标签信息或具有第一优先级的第三标签信息的优先级置为第二优先级。In addition, the on-chip cache 600 may further include a storage controller 602, configured to set the priority of the second tag information to the first priority, and set the first tag information or the third tag information having the first priority to The priority is set to the second priority.
如前所述,为了对第一页面的存储进行选择和管理,第一部分数据单元的第一优先级要高于第二部分数据单元的第二优先级。但是在某些情况下(例如第二部分数据单元的访问频率较高或访问次数较多)需要进行优先级的替换,这时候,需将第二标签信息的优先级置为第一优先级,并将第一标签信息或具有第一优先级的第三标签信息的优先级置为第二优先级。这样的话,第二标签信息的优先级就高于第一标签信息的优先级了,那么第二标签信息对应的数据单元可以优先存储于片内缓存中。As mentioned above, in order to select and manage the storage of the first page, the first priority of the first part of the data unit is higher than the second priority of the second part of the data unit. However, in some cases (for example, the second part of the data unit is frequently accessed or the number of accesses is high), priority replacement is required. At this time, the priority of the second tag information needs to be set to the first priority. The priority of the first tag information or the third tag information having the first priority is set as the second priority. In this case, the priority of the second label information is higher than the priority of the first label information, and then the data unit corresponding to the second label information can be stored in the on-chip cache preferentially.
具体地,在进行上述优先级替换时,可以像图5的示例那样通过一个角色翻转位进行替换,也可以采用传统方式将第一部分数据单元的索引信息与第二部分数据单元的索引信息的存储位置调换。Specifically, when the priority replacement is performed, the replacement may be performed by using a role flip bit as in the example of FIG. 5, and the index information of the first part of the data unit and the index information of the second part of the data unit may be stored in a conventional manner. Relocation.
需要说明的是,在本申请实施例的描述中,均以第一页面存储两部分数据单元(第一部分数据单元和第二部分数据单元)为例进行示意,实际应用中,片内缓存中也可以存储两个以上标签信息对应的数据单元,只要将每个部分的数据单元的优先级进行合理限定即可。比如,前述第一页面中还可以包括第三部分数据单元,第三部分数据单元对应的第四标签信息具有第三优先级,第三优先级低于第二优先级。It should be noted that in the description of the embodiments of the present application, the first page stores two data units (the first data unit and the second data unit) as an example. In practical applications, the on-chip cache also It is possible to store more than two data units corresponding to the tag information, as long as the priority of the data unit of each part is reasonably limited. For example, the foregoing first page may further include a third part of the data unit, and the fourth tag information corresponding to the third part of the data unit has a third priority, and the third priority is lower than the second priority.
采用图6所示的片内缓存,可以在第一页面中存储第一标签信息对应的第一部分数据单元以及第二标签信息对应的第二部分数据单元。与现有技术中在一个页面仅存储一个标签信息对应的数据单元的方案相比,采用图6所示的片内缓存600,可以将原本存储在多个页面中的数据压缩存储至一个页面,从而节省片内缓存的存储空间。Using the on-chip cache shown in FIG. 6, the first part of the data unit corresponding to the first tag information and the second part of the data unit corresponding to the second tag information can be stored in the first page. Compared with the prior art solution of storing only the data unit corresponding to one tag information on one page, the on-chip cache 600 shown in FIG. 6 can be used to compress and store data originally stored in multiple pages to one page. So as to save the storage space of the on-chip cache.
基于同一发明构思,本申请实施例还提供一种集成芯片。参见图7,该集成芯片700包括片内缓存701。片内缓存701用于存储第一页面,第一页面包括第一部分数据单元和第二部分数据单元,第一部分数据单元对应的第一标签信息具有第一优先级,第二部分数据单元对应的第二标签信息具有第二优先级,第二优先级低于第一优先级。Based on the same inventive concept, an embodiment of the present application further provides an integrated chip. Referring to FIG. 7, the integrated chip 700 includes an on-chip cache 701. The on-chip cache 701 is used to store a first page. The first page includes a first part of the data unit and a second part of the data unit. The first tag information corresponding to the first part of the data unit has a first priority, and the first part of the second part of the data unit corresponds to the first The two-label information has a second priority, and the second priority is lower than the first priority.
具体地,该集成芯片700可以是片上系统(system on chip,SoC)。Specifically, the integrated chip 700 may be a system on chip (SoC).
此外,片内缓存701还用于:存储第一页面的索引信息,第一页面的索引信息包括第一部分数据单元的索引信息以及第二部分数据单元的索引信息。其中,第一部分数据单元的索引信息包括第一标签信息和第一有效位信息,第二部分数据单元的索引信息包括第二标签信息和第二有效位信息。In addition, the on-chip cache 701 is further configured to store index information of the first page, where the index information of the first page includes index information of the first part of the data unit and index information of the second part of the data unit. The index information of the first part of the data unit includes the first tag information and the first significant bit information, and the index information of the second part of the data unit includes the second label information and the second significant bit information.
同样地,第一页面还包括第三部分数据单元,第三部分数据单元对应的第五标签信息具有第三优先级,第三优先级低于第二优先级。Similarly, the first page further includes a third part of the data unit, and the fifth tag information corresponding to the third part of the data unit has a third priority, and the third priority is lower than the second priority.
上述片内缓存701的相关介绍可参见图6所示的片内缓存600中的相关介绍,此处不再赘述。For a related description of the above-mentioned on-chip cache 701, reference may be made to the related introduction in the on-chip cache 600 shown in FIG. 6, and details are not described herein again.
集成芯片700中还可以包括:处理器702,用于发送第一访问指令,第一访问指令用于请求访问第一数据单元,第一数据单元对应第一标签信息;片内缓存701还用于:根据第一有效位信息确定第一数据单元存储在片内缓存701中;将第一数据单元发送给处理器702。The integrated chip 700 may further include: a processor 702, configured to send a first access instruction, where the first access instruction is used to request access to the first data unit, and the first data unit corresponds to the first tag information; the on-chip cache 701 is further configured to: : Determine that the first data unit is stored in the on-chip buffer 701 according to the first valid bit information; and send the first data unit to the processor 702.
也就是说,当处理器702请求访问第一标签信息对应的数据单元时,若该数据单元存储在片内缓存701中,则直接将该数据单元返回给处理器702,而不必从片外内存中获取,从而提高数据访问效率。That is, when the processor 702 requests to access the data unit corresponding to the first tag information, if the data unit is stored in the on-chip cache 701, the data unit is directly returned to the processor 702 without having to be retrieved from off-chip memory. Access, thereby improving data access efficiency.
在一种可能的设计中,处理器702还用于:发送第二访问指令,第二访问指令用于请求访问第二数据单元,第二数据单元对应第二标签信息;片内缓存701还用于:根据第二有效位信息确定第二数据单元存储在片内缓存701中;将第二数据单元发送给处理器702。In a possible design, the processor 702 is further configured to send a second access instruction, the second access instruction is used to request access to the second data unit, and the second data unit corresponds to the second tag information; the on-chip cache 701 is also used Yu: Determine that the second data unit is stored in the on-chip buffer 701 according to the second significant bit information; and send the second data unit to the processor 702.
也就是说,当处理器702请求访问第二标签信息对应的数据单元时,若该数据单元存储在片内缓存701中,则直接将该数据单元返回给处理器702,而不必从片外内存中获取,从而提高数据访问效率。That is, when the processor 702 requests to access the data unit corresponding to the second tag information, if the data unit is stored in the on-chip cache 701, the data unit is directly returned to the processor 702 without having to be retrieved from off-chip memory. Access, thereby improving data access efficiency.
进一步地,若处理器702访问的是第二标签信息对应的数据单元,则片内缓存701还用于:根据第二数据单元在单位时间内的被访问次数确定是否将第二标签信息的优先级置为第一优先级,并将第一标签信息或具有第一优先级的第三标签信息的优先级置为第二优先级。Further, if the processor 702 accesses a data unit corresponding to the second tag information, the on-chip cache 701 is further configured to determine whether to prioritize the second tag information according to the number of times the second data unit is accessed in a unit time. The priority is set to the first priority, and the priority of the first label information or the third label information having the first priority is set to the second priority.
如前所述,在某些情况下,可以将第二标签信息和第一标签信息的优先级进行替换。那么,片内缓存701在确定处理器702访问第二标签信息对应的数据单元后,可以根据第二标签信息对应的数据单元在单位时间内的被访问次数确定是否进行上述优先级替换。As mentioned above, in some cases, the priorities of the second tag information and the first tag information may be replaced. Then, after determining that the processor 702 accesses the data unit corresponding to the second tag information, the on-chip cache 701 may determine whether to perform the foregoing priority replacement according to the number of times the data unit corresponding to the second tag information is accessed in a unit time.
示例性地,若第二标签信息的访问计数超过同一set(以4路组相连为例,同一set中包括四个页面)中第一优先级的标签信息的访问计数最小值的20%,则进行上述优先级替换过程。在进行替换时,选择同一Set中优先级为第一优先级的标签信息进行替换。Exemplarily, if the access count of the second tag information exceeds 20% of the minimum value of the access count of the first priority tag information in the same set (taking a 4-way group connection as an example, including four pages in the same set), then Perform the above priority replacement process. When replacing, the tag information with the first priority in the same Set is selected for replacement.
在另一种可能的设计中,在处理器702还用于发送第三访问指令,第三访问指令用于请求访问第三数据单元,第三数据单元对应第二标签信息;片内缓存701还用于:根据第二有效位信息确定第三数据单元未存储在片内缓存701中;处理器702还用于从片外内存中读取第三数据单元并存储;片内缓存701还用于存储第三数据单元。In another possible design, the processor 702 is further configured to send a third access instruction, the third access instruction is used to request access to a third data unit, and the third data unit corresponds to the second tag information; the on-chip cache 701 also Used for: determining that the third data unit is not stored in the on-chip cache 701 according to the second significant bit information; the processor 702 is further configured to read and store the third data unit from the off-chip memory; and the on-chip cache 701 is further used for A third data unit is stored.
也就是说,当处理器702请求访问第二标签信息对应的数据单元时,若该数据单元未存储在片内缓存701中,则从片外内存中读取相应数据单元,片内缓存701将该数据单元存储。That is, when the processor 702 requests to access the data unit corresponding to the second tag information, if the data unit is not stored in the on-chip cache 701, the corresponding data unit is read from the off-chip memory, and the on-chip cache 701 will The data unit is stored.
具体地,片内缓存701在存储第三数据单元时,可以将第三数据单元存储在第一页面 或第二页面中。其中,第二页面和第一页面是在Data Array中的同一Way上的数据单元。Specifically, when the on-chip cache 701 stores the third data unit, the third data unit may be stored in the first page or the second page. Among them, the second page and the first page are data units on the same Way in the Data Array.
进一步地,片内缓存701还用于:将第二有效位信息中与第三数据单元对应的有效位置为有效;将第二标签信息的优先级置为第一优先级,并将第一标签信息或具有第一优先级的第四标签信息的优先级置为第二优先级。Further, the on-chip cache 701 is further configured to: make the effective position corresponding to the third data unit in the second significant bit information valid; set the priority of the second label information to the first priority, and set the first label The priority of the information or the fourth tag information having the first priority is set as the second priority.
在本申请实施例中,第二标签信息可以视为第一标签信息的victim角色,当第二标签信息对应的数据单元未存储在片内缓存701的情况下,且该数据单元的相应存储位置被第一标签信息占用,则将相应数据读取到片内缓存701中时,具有第二优先级的第二标签信息需要升级为第一优先级。这时候,必然要将Tag Array中的某一具有第一优先级的标签信息降级为第二优先级,这个降级的标签信息可以是第一标签信息,也可以是第四标签信息。In the embodiment of the present application, the second tag information may be regarded as the victim role of the first tag information. When the data unit corresponding to the second tag information is not stored in the on-chip cache 701, and the corresponding storage location of the data unit Occupied by the first tag information, when the corresponding data is read into the on-chip cache 701, the second tag information having the second priority needs to be upgraded to the first priority. At this time, one of the tag information having the first priority in the Tag Array must be downgraded to the second priority. This downgraded tag information may be the first tag information or the fourth tag information.
不难看出,在对第一页面进行管理时,经常会涉及到优先级替换的问题。无论是根据单位时间内的被访问次数确定进行的优先级替换,还是由于第二标签信息的访问未在片内缓存701中命中而进行的优先级替换,对于第二标签信息(SubTag)来说,均可以称为SubTag promote(升级)过程。It is not difficult to see that when managing the first page, the problem of priority replacement is often involved. Regardless of whether the priority replacement is determined based on the number of times accessed in a unit time, or the priority replacement is performed because the access to the second tag information is not hit in the on-chip cache 701, for the second tag information (SubTag) , Can be called SubTag Promote (upgrade) process.
进行SubTag promote时,可以是同一way内的优先级替换(例如第一标签信息和第二标签信息的优先级替换),也可以是不同Way内的优先级替换(例如第一标签信息和第三标签信息的优先级替换,或者第一标签信息和第四标签信息的优先级替换)。When performing SubTag promotion, it can be the priority replacement in the same way (such as the priority replacement of the first tag information and the second label information), or the priority replacement in different Ways (such as the first tag information and the third Priority replacement of the label information, or priority replacement of the first label information and the fourth label information).
在进行SubTag promote时,尽量选择相同set中同一way内的PrimeTag进行,如果选择相同set中其他way,则会引入额外的访问内存操作,但是优点是被替换的PrimeTag是最近最少使用的。反之,如果只选择相同set中相同way中的PrimeTag进行替换,则会出现所替换的PrimeTag可能不是最近最少使用的情况。When performing SubTag promotion, try to select PrimeTags in the same set in the same set as much as possible. If you select other ways in the same set, additional memory access operations will be introduced, but the advantage is that the PrimeTags being replaced are the least recently used. Conversely, if only the PrimeTags in the same way in the same set are selected for replacement, there may be a case where the replaced PrimeTags are not the least recently used.
此外,若是不同Way内的优先级替换,被替换的具有第一优先级的标签信息,与踢出片内缓存的具有第二优先级的标签信息可以是同一Way内的,也可以是不同Way内的。In addition, if the priority is replaced in different Ways, the replaced tag information with the first priority may be in the same Way as the tag information with the second priority cached in the movie, or it may be a different Way. inside.
在进行SubTag promote时,除了优先级的替换之外,还涉及有效位信息的操作,下面以两个具体示例为例介绍SubTag promote过程。When performing SubTag Promote, in addition to priority replacement, operations involving significant bit information are also involved. The following uses two specific examples as examples to introduce the SubTag Promote process.
一、被替换的具有第一优先级的标签信息,与踢出片内缓存的具有第二优先级的标签信息是同一Way内的。1. The replaced tag information with the first priority is the same as the tag information with the second priority cached in the movie.
同一Way内的SubTag promote的一个具体示例可以如图8所示。处理器访问第二有效位信息中第五个有效位对应的数据单元,该数据单元未存储在片内缓存中,则从片外DRAM中读取该数据单元,需要注意的是,在从片外DRAM取数据时,需一并将第二标签信息中其他有效位信息对应的数据单元一并取回(即根据历史访问信息进行数据预取)。取回相应数据后,需要将第二有效位信息中第五个有效位置为有效,并将第二标签信息的优先级升级为第一优先级。为此,选中具有第一优先级的标签信息(以下称为Tag1)进行替换,Tag1对应的有效位信息为10010010,同一Way中第二优先级的标签信息(以下称为Tag2)对应的有效位信息为01100001。将第二标签信息的优先级升级为第一优先级后,Tag2对应的数据被踢出片内缓存,存储至片外DRAM;Tag1降级为第二优先级,由于第一个有效位和第四个有效位对应的数据与升级后的第二标签信息冲突,因Tag1优先级较低,因而Tag1中相应数据单元被存储至片外DRAM。SubTag promote完成后的有效位信息如图8所示。A specific example of the SubTag Promote in the same Way can be shown in Figure 8. The processor accesses the data unit corresponding to the fifth significant bit in the second significant bit information, and the data unit is not stored in the on-chip cache, then the data unit is read from the off-chip DRAM. It should be noted that When fetching data from the external DRAM, the data units corresponding to other valid bit information in the second tag information must be retrieved together (that is, data prefetching is performed based on historical access information). After the corresponding data is retrieved, the fifth valid position in the second significant bit information needs to be valid, and the priority of the second tag information is upgraded to the first priority. For this reason, the tag information with the first priority (hereinafter referred to as Tag1) is selected for replacement. The valid bit information corresponding to Tag1 is 10010010, and the valid bits corresponding to the second priority tag information (hereinafter referred to as Tag2) in the same Way. The information is 01100001. After the priority of the second tag information is upgraded to the first priority, the data corresponding to Tag2 is kicked out of the on-chip cache and stored in the off-chip DRAM; Tag1 is downgraded to the second priority because the first valid bit and the fourth The data corresponding to each valid bit conflicts with the upgraded second tag information. Because Tag1 has a lower priority, the corresponding data unit in Tag1 is stored in the off-chip DRAM. After the completion of the SubTag Promote, the valid bit information is shown in Figure 8.
可以看出,在图8的示例中,被替换的Tag2于被踢出片内缓存的Tag1是同一Way内 的。It can be seen that in the example of FIG. 8, the replaced Tag 2 is in the same Way as the tag 1 that is kicked out of the on-chip cache.
二、被替换的具有第一优先级的标签信息,与踢出片内缓存的具有第二优先级的标签信息是不同Way内的。2. The replaced tag information with the first priority is different from the tag information with the second priority cached in the movie.
不同Way内的SubTag promote的一个具体示例可以如图9所示。处理器访问第二有效位信息中第五个有效位对应的数据单元,该数据单元未存储在片内缓存中,则从片外DRAM中读取该数据单元,需要注意的是,在从片外DRAM取数据时,需一并将第二标签信息中其他有效位信息对应的数据单元一并取回。取回相应数据后,需要将第二有效位信息中第五个有效位置为有效,并将第二标签信息的优先级升级为第一优先级。为此,选中具有第一优先级的标签信息Tag1(假设是Way1内的,即列向量1内的),那么,此时还需选择一个具有第二优先级的标签信息踢出片内缓存(假设是Way3内的,即列向量3内的)。A specific example of SubTag and Promote in different Ways can be shown in Figure 9. The processor accesses the data unit corresponding to the fifth significant bit in the second significant bit information, and the data unit is not stored in the on-chip cache, then the data unit is read from the off-chip DRAM. It should be noted that When fetching data from the external DRAM, the data units corresponding to other significant bit information in the second tag information must be retrieved together. After the corresponding data is retrieved, the fifth valid position in the second significant bit information needs to be valid, and the priority of the second tag information is upgraded to the first priority. To do this, select the tag information with the first priority Tag1 (assuming it is in Way1, that is, in column vector 1), then at this time, you also need to select a tag information with the second priority to kick out the on-chip cache Suppose it is in Way3, that is, in column vector 3.)
在图9的示例中,将第二有效位信息中第五个有效位置为有效后,选中Way1中具有第一优先级的Tag1进行替换,那么Tag1对应信息可以寻找片内缓存中的其他位置存储,Tag2对应信息中与更新后的第二有效位信息不冲突的部分可以保留,冲突部分存储至片外DRAM。In the example in FIG. 9, after the fifth valid position in the second significant bit information is valid, Tag1 with the first priority in Way1 is selected for replacement, and then the corresponding information of Tag1 can be found in other locations in the on-chip cache for storage. The part of Tag2 corresponding information that does not conflict with the updated second significant bit information can be retained, and the conflicting part is stored in the off-chip DRAM.
为了将Tag1继续存储在片内缓存中,找到Way3中具有第二优先级的Tag4进行替换,因此可以将Tag4对应信息全部存储至片外DRAM。然后,将Tag1与Way3中具有第一优先级的Tag3进行对比,与Tag3不冲突部分可以保留在片内缓存中,与Tag3冲突部分则存储至片外DRAM。In order to continue to store Tag1 in the on-chip cache, find Tag4 with the second priority in Way3 for replacement, so all the corresponding information of Tag4 can be stored in the off-chip DRAM. Then, compare Tag1 with Tag3 with the first priority in Way3. The non-conflicting part with Tag3 can be kept in the on-chip cache, and the conflicting part with Tag3 is stored in the off-chip DRAM.
此外,本申请实施例中,集成芯片700中还可以包括标签缓存器703,用于存储第一标签信息和第二标签信息。其中,标签缓存器703可以位于MC中。In addition, in the embodiment of the present application, the integrated chip 700 may further include a tag buffer 703 for storing the first tag information and the second tag information. The tag buffer 703 may be located in the MC.
也就是说,在标签缓存器703中,还可以存储有片内缓存中存储的页面的索引信息。除了存储标签信息之外,还可以存储有效位信息、脏污位信息等信息。这样的话,当处理器发出访问指令后,可以先在标签缓存器703中查找相应信息,以确定处理器访问的数据是否存储在片内缓存中,从而提高数据访问效率。That is, the tag buffer 703 may further store index information of pages stored in the on-chip cache. In addition to storing tag information, information such as valid bit information and dirty bit information can also be stored. In this case, after the processor issues an access instruction, the corresponding information may be first searched in the tag buffer 703 to determine whether the data accessed by the processor is stored in the on-chip cache, thereby improving data access efficiency.
需要说明的是,标签缓存器703中的数据可能比片内缓存701中的数据新,因此当标签缓存器703中发生数据更新时,需要将更新后的数据写回片内缓存701。此外,当片内缓存701或片外内存中的页面索引信息发生变化时,标签缓存器703中的数据必须同步更新。It should be noted that the data in the tag buffer 703 may be newer than the data in the on-chip cache 701. Therefore, when data update occurs in the tag buffer 703, the updated data needs to be written back to the on-chip cache 701. In addition, when the page index information in the on-chip cache 701 or the off-chip memory changes, the data in the tag buffer 703 must be updated synchronously.
不难看出,在集成芯片700中,对所访问数据的存储位置的判断可能发生在标签缓存器(TB),片内缓存(DC,即DRAM cache)和片外DRAM中。在一个具体的示例中,处理器的数据访问流程可以如图10所示。It is not difficult to see that in the integrated chip 700, the judgment of the storage location of the accessed data may occur in the tag buffer (TB), the on-chip cache (DC, that is, DRAM cache), and the off-chip DRAM. In a specific example, the data access flow of the processor may be as shown in FIG. 10.
当内存访问请求在末级缓存(Last Level Cache,LLC)发生miss(未命中)后,请求就送达内存控制器中的TB,比对标签信息(Tag)如果发生TB命中(hit),则继续判断TB中对应的有效位信息(valid bits)中相应有效位(valid bit)是否为1(valid bit=1即为有效,valid bit=0即为无效),如果为1则表示数据在DC中,继续访问DC。在DC中判断当前访问的地址命中的是第一标签信息(PrimeTag)还是第二标签信息(SubTag)(在TB中不区分PrimeTag和SubTag),如果是PrimeTag(TB hit,DC PrimeTag hit,valid)则确定数据在DC中,直接访问DC获取数据,更新PrimeTag被访问次数的相关计数。如果 访问的是SubTag(TB hit,DC SubTag hit,valid)则确定数据在DC中,直接访问DC获取数据,更新SubTag计数并判断是否需要进行第二标签信息升级(SubTag promote)。When a memory access request is missed in the Last Level Cache (LLC), the request is sent to the TB in the memory controller. If the tag information (Tag) is hit, then Continue to determine whether the corresponding valid bit in the corresponding valid bit information (valid bits) in the TB is 1 (valid = 1 is valid, valid = 0 is invalid), if it is 1, it means that the data is in DC , Continue to access the DC. It is determined in the DC whether the currently accessed address hits the first tag information (PrimeTag) or the second tag information (SubTag) (PrimeTag and SubTag are not distinguished in TB). If it is PrimeTag (TB hit, DC PrimeTag hit, valid) Then it is determined that the data is in the DC, and the DC is directly accessed to obtain the data, and the related count of the number of times that the PrimeTag is accessed is updated. If the access is a SubTag (TB, Hit, DC, SubTag, hit, valid), it is determined that the data is in the DC, and the DC is directly accessed to obtain the data, update the SubTag count, and determine whether a second tag information upgrade (SubTag promotion) is required.
如果访问TB发现Tag命中(hit)且valid bit=0,则说明Tag对应页面在DC中,但是对应的footprint(即数据)不在DC中(发生DC Tag hit,数据miss,即false miss)。查询DC后,确认访问的是PrimeTag还是SubTag,如果是PrimeTag(TB hit,PrimeTag,not valid),因为PrimeTag优先级高,可以直接从片外DRAM返回数据并更新DC和TB中相应Valid bits为1,此时需要同时把同组SubTag对应的valid bits在DC和TB中都清零。如果是SubTag(TB hit,DC SubTag Hit,not valid)则访问片外DRAM获取数据,并决定是否进行SubTag promote。If you visit the TB and find that the tag hits and the valid bit = 0, it means that the page corresponding to the tag is in the DC, but the corresponding footprint (that is, data) is not in the DC (DC hit, data miss, that is, false miss). After querying the DC, confirm whether the PrimeTag or SubTag is accessed. If it is PrimeTag (TB hit, PrimeTag, not valid), because PrimeTag has a high priority, you can directly return data from off-chip DRAM and update the corresponding Valid bits in DC and TB to 1. At this time, the valid bits corresponding to the same set of SubTags need to be cleared in both DC and TB. If it is a SubTag (TB, Hit, DC, SubTag, Hit, not valid), then access the off-chip DRAM to obtain data, and decide whether to perform SubTag promotion.
如果访问TB发生miss,此时无法确定数据是否在DC中,因为TB miss只代表DC中的Tag Array此时不在TB中。继续访问DC确认是否发生real hit(即Tag命中,且valid bit=1,本申请中也可以将real hit称为全部命中),如果发生real hit,继续区分是PrimeTag还是SubTag,如果PrimeTag命中(TB miss,DC PrimeTag hit,valid),从DC中返回数据,同时把对应Tag和valid bits写回TB中,如果SubTag命中(TB miss,DC subTag Hit,valid),从DC中返回数据,同时把对应Tag和valid bits写回TB中。If a miss occurs when accessing the TB, it is impossible to determine whether the data is in the DC at this time, because the TB miss only represents that the Tag Array in the DC is not in the TB at this time. Continue to visit the DC to confirm whether a real hit (that is, a tag hit and valid bit = 1 can also be referred to as a full hit in this application). If a real hit occurs, continue to distinguish whether it is a PrimeTag or a SubTag. If the PrimeTag hits (TB miss, DC (Prime Tag hit, valid), return the data from the DC, and write the corresponding Tag and valid bits back to the TB. If the SubTag hits (TB miss, DC subTag Hit, valid), return the data from the DC, and at the same time Tag and valid bits are written back to TB.
如果访问TB发生miss,并且发生DC的Tag命中但是valid=0(即无效),则发生DC only Tag hit(仅Tag命中)。此时需要继续判断PrimeTag是否命中,如果发生DC primeTag命中(TB miss,DC PrimeTag hit,not valid),则访问片外DRAM获取数据,并更新DC中的valid bits和TB中的Tag和valid bits。如果发生DC SubTag命中(TB miss,DC SubTag hit,not valid)访问片外DRAM获取数据,并决定是否进行SubTag promote。If a miss occurs when accessing the TB, and a tag hit of the DC occurs but valid = 0 (that is, invalid), a DC only tag hit occurs. At this time, it is necessary to continue to determine whether the PrimeTag hits. If a DC primaryTag hit (TB miss, DC PrimeTag hit, not valid), access the off-chip DRAM to obtain data, and update the valid bits in the DC and the Tag and valid bits in the TB. If a DC SubTag hit (TB miss, DC SubTag hit, not valid) accesses the off-chip DRAM to obtain data, and decides whether to perform a SubTag promotion.
如果访问TB发生miss,并且DC也是real miss(TB miss,DC real miss),达到原有PrimeTag替换标准,则发生PrimeTag替换。If a miss occurs when accessing the TB, and the DC is also a real miss (TB miss, DC real miss), which meets the original PrimeTag replacement standard, a PrimeTag replacement occurs.
具体地,第二标签信息升级(SubTag promote)的过程可以如图11所示。首先根据访问流程及设定条件确定是否进行SubTag promote。若不进行,则仅更新第二标签信息(SubTag)的访问计数即可。若进行SubTag promote,则判断被替换的第一标签信息(PrimeTag)和SubTag是否在同一列向量(way)中,若在同一way中,则直接将Prime Tag和SubTag进行角色翻转即可,同时把Prime Tag中与SubTag冲突的有效位对应的数据写回片外;若被替换的Prime Tag和SubTag不在同一way中,则可以按照前述图8和图9示出的方法进行SubTag promote,此处不再赘述。Specifically, the process of upgrading the second tag information (SubTag) can be as shown in FIG. 11. First determine whether to perform SubTag promotion according to the access process and set conditions. If not, it is only necessary to update the access count of the second tag information (SubTag). If SubTag promotion is performed, it is judged whether the replaced first tag information (PrimeTag) and SubTag are in the same column vector (way). If they are in the same way, the roles of PrimeTag and SubTag can be directly reversed. At the same time, The data corresponding to the valid bit that conflicts with the SubTag in the PrimeTag is written back to the chip. If the PrimeTag and the SubTag that are replaced are not in the same way, you can perform the SubTag promotion according to the method shown in Figure 8 and Figure 9 above. More details.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application also intends to include these changes and variations.

Claims (16)

  1. 一种片内缓存,其特征在于,包括:An on-chip cache is characterized in that it includes:
    存储单元,用于存储第一页面,所述第一页面包括第一部分数据单元和第二部分数据单元,所述第一部分数据单元对应的第一标签信息具有第一优先级,所述第二部分数据单元对应的第二标签信息具有第二优先级,所述第二优先级低于所述第一优先级。A storage unit, configured to store a first page, where the first page includes a first partial data unit and a second partial data unit, and the first tag information corresponding to the first partial data unit has a first priority and the second partial The second tag information corresponding to the data unit has a second priority, and the second priority is lower than the first priority.
  2. 如权利要求1所述的片内缓存,其特征在于,所述存储单元还用于:The on-chip cache according to claim 1, wherein the storage unit is further configured to:
    存储所述第一页面的索引信息,所述第一页面的索引信息包括所述第一部分数据单元的索引信息以及所述第二部分数据单元的索引信息。The index information of the first page is stored, and the index information of the first page includes the index information of the first partial data unit and the index information of the second partial data unit.
  3. 如权利要求2所述的片内缓存,其特征在于,所述第一部分数据单元的索引信息包括所述第一标签信息和第一有效位信息,所述第二部分数据单元的索引信息包括所述第二标签信息和第二有效位信息。The on-chip cache according to claim 2, wherein the index information of the first partial data unit includes the first tag information and the first significant bit information, and the index information of the second partial data unit includes the The second tag information and the second significant bit information are described.
  4. 如权利要求1~3任一项所述的片内缓存,其特征在于,还包括:The on-chip cache according to any one of claims 1 to 3, further comprising:
    存储控制器,用于将所述第二标签信息的优先级置为所述第一优先级,并将所述第一标签信息或具有第一优先级的第三标签信息的优先级置为所述第二优先级。A storage controller, configured to set the priority of the second label information to the first priority, and set the priority of the first label information or the third label information having the first priority to all The second priority is described.
  5. 如权利要求1~4任一项所述的片内缓存,其特征在于,所述第一页面还包括第三部分数据单元,所述第三部分数据单元对应的第四标签信息具有第三优先级,所述第三优先级低于所述第二优先级。The on-chip cache according to any one of claims 1 to 4, wherein the first page further includes a third partial data unit, and fourth tag information corresponding to the third partial data unit has a third priority. Level, the third priority is lower than the second priority.
  6. 一种集成芯片,其特征在于,包括:An integrated chip, comprising:
    片内缓存,用于存储第一页面,所述第一页面包括第一部分数据单元和第二部分数据单元,所述第一部分数据单元对应的第一标签信息具有第一优先级,所述第二部分数据单元对应的第二标签信息具有第二优先级,所述第二优先级低于所述第一优先级。An on-chip cache is used to store a first page, where the first page includes a first partial data unit and a second partial data unit, and the first tag information corresponding to the first partial data unit has a first priority, and the second The second tag information corresponding to some data units has a second priority, and the second priority is lower than the first priority.
  7. 如权利要求6所述的集成芯片,其特征在于,所述片内缓存还用于:The integrated chip according to claim 6, wherein the on-chip cache is further configured to:
    存储所述第一页面的索引信息,所述第一页面的索引信息包括所述第一部分数据单元的索引信息以及所述第二部分数据单元的索引信息。The index information of the first page is stored, and the index information of the first page includes the index information of the first partial data unit and the index information of the second partial data unit.
  8. 如权利要求7所述的集成芯片,其特征在于,所述第一部分数据单元的索引信息包括所述第一标签信息和第一有效位信息,所述第二部分数据单元的索引信息包括所述第二标签信息和第二有效位信息。The integrated chip according to claim 7, wherein the index information of the first partial data unit includes the first tag information and the first significant bit information, and the index information of the second partial data unit includes the Second tag information and second significant bit information.
  9. 如权利要求8所述的集成芯片,其特征在于,还包括:The integrated chip according to claim 8, further comprising:
    处理器,用于发送第一访问指令,所述第一访问指令用于请求访问第一数据单元,所述第一数据单元对应所述第一标签信息;A processor, configured to send a first access instruction, where the first access instruction is used to request access to a first data unit, and the first data unit corresponds to the first tag information;
    所述片内缓存还用于:The on-chip cache is also used for:
    根据所述第一有效位信息确定所述第一数据单元存储在所述片内缓存中;Determining that the first data unit is stored in the on-chip cache according to the first significant bit information;
    将所述第一数据单元发送给所述处理器。Sending the first data unit to the processor.
  10. 如权利要求9所述的集成芯片,其特征在于,所述处理器还用于:The integrated chip according to claim 9, wherein the processor is further configured to:
    发送第二访问指令,所述第二访问指令用于请求访问第二数据单元,所述第二数据单元对应所述第二标签信息;Sending a second access instruction, where the second access instruction is used to request access to a second data unit, and the second data unit corresponds to the second tag information;
    所述片内缓存还用于:The on-chip cache is also used for:
    根据所述第二有效位信息确定所述第二数据单元存储在所述片内缓存中;Determining that the second data unit is stored in the on-chip cache according to the second significant bit information;
    将所述第二数据单元发送给所述处理器。Sending the second data unit to the processor.
  11. 如权利要求10所述的集成芯片,其特征在于,所述片内缓存还用于:The integrated chip according to claim 10, wherein the on-chip cache is further configured to:
    根据所述第二数据单元在单位时间内的被访问次数确定是否将所述第二标签信息的优先级置为所述第一优先级,并将所述第一标签信息或具有第一优先级的第三标签信息的优先级置为所述第二优先级。Determining whether to set the priority of the second tag information to the first priority according to the number of times the second data unit is accessed in a unit time, and the first tag information may have the first priority The priority of the third tag information is set to the second priority.
  12. 如权利要求9~11任一项所述的集成芯片,其特征在于,所述处理器还用于:The integrated chip according to any one of claims 9 to 11, wherein the processor is further configured to:
    发送第三访问指令,所述第三访问指令用于请求访问第三数据单元,所述第三数据单元对应所述第二标签信息;Sending a third access instruction, where the third access instruction is used to request access to a third data unit, and the third data unit corresponds to the second tag information;
    所述片内缓存还用于:The on-chip cache is also used for:
    根据所述第二有效位信息确定所述第三数据单元未存储在所述片内缓存中;Determine that the third data unit is not stored in the on-chip cache according to the second significant bit information;
    所述处理器还用于:The processor is further configured to:
    从片外内存中读取所述第三数据单元;Reading the third data unit from an off-chip memory;
    所述片内缓存还用于:The on-chip cache is also used for:
    存储所述第三数据单元。The third data unit is stored.
  13. 如权利要求12所述的集成芯片,其特征在于,所述片内缓存在存储所述第三数据单元时,具体用于:The integrated chip according to claim 12, wherein the on-chip cache is specifically configured to: when storing the third data unit:
    将所述第三数据单元存储在第一页面或第二页面中。The third data unit is stored in the first page or the second page.
  14. 如权利要求12或13所述的集成芯片,其特征在于,所述片内缓存还用于:The integrated chip according to claim 12 or 13, wherein the on-chip cache is further configured to:
    将所述第二有效位信息中与所述第三数据单元对应的有效位置为有效;Valid a valid position corresponding to the third data unit in the second valid bit information;
    将所述第二标签信息的优先级置为所述第一优先级,并将所述第一标签信息或具有第一优先级的第四标签信息的优先级置为所述第二优先级。Setting the priority of the second label information as the first priority, and setting the priority of the first label information or the fourth label information having the first priority as the second priority.
  15. 如权利要求6~14任一项所述的集成芯片,其特征在于,还包括:The integrated chip according to any one of claims 6 to 14, further comprising:
    标签缓存器,用于存储所述第一标签信息和所述第二标签信息。A tag buffer is configured to store the first tag information and the second tag information.
  16. 如权利要求6~15任一项所述的集成芯片,其特征在于,所述第一页面还包括第三部分数据单元,所述第三部分数据单元对应的第五标签信息具有第三优先级,所述第三优先级低于所述第二优先级。The integrated chip according to any one of claims 6 to 15, wherein the first page further includes a third partial data unit, and the fifth tag information corresponding to the third partial data unit has a third priority , The third priority is lower than the second priority.
PCT/CN2019/112380 2019-10-21 2019-10-21 On-chip cache and integrated chip WO2020001665A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980101522.1A CN114556335A (en) 2019-10-21 2019-10-21 On-chip cache and integrated chip
PCT/CN2019/112380 WO2020001665A2 (en) 2019-10-21 2019-10-21 On-chip cache and integrated chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/112380 WO2020001665A2 (en) 2019-10-21 2019-10-21 On-chip cache and integrated chip

Publications (2)

Publication Number Publication Date
WO2020001665A2 true WO2020001665A2 (en) 2020-01-02
WO2020001665A3 WO2020001665A3 (en) 2020-07-09

Family

ID=68985828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/112380 WO2020001665A2 (en) 2019-10-21 2019-10-21 On-chip cache and integrated chip

Country Status (2)

Country Link
CN (1) CN114556335A (en)
WO (1) WO2020001665A2 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006093394A1 (en) * 2005-03-04 2006-09-08 Chutnoon Inc. Server, method and system for providing information search service by using web page segmented into several information blocks
CN103546505B (en) * 2012-07-12 2018-03-06 百度在线网络技术(北京)有限公司 The method, system and device that Segment is according to priority sequentially shown
CN104850415A (en) * 2014-02-13 2015-08-19 腾讯科技(深圳)有限公司 Method and apparatus for loading pages
CN104461937B (en) * 2014-12-08 2017-10-03 福建新大陆通信科技股份有限公司 A kind of method and system of set box browser internal memory optimization
CN108875036B (en) * 2018-06-26 2021-03-16 北京永安信通科技有限公司 Page data caching method and device and electronic equipment

Also Published As

Publication number Publication date
WO2020001665A3 (en) 2020-07-09
CN114556335A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
US20220138103A1 (en) Method and apparatus for controlling cache line storage in cache memory
US8949544B2 (en) Bypassing a cache when handling memory requests
US6912628B2 (en) N-way set-associative external cache with standard DDR memory devices
US6490655B1 (en) Data processing apparatus and method for cache line replacement responsive to the operational state of memory
US7613870B2 (en) Efficient memory usage in systems including volatile and high-density memories
JP7340326B2 (en) Perform maintenance operations
US6829679B2 (en) Different caching treatment of memory contents based on memory region
US20080229026A1 (en) System and method for concurrently checking availability of data in extending memories
US6832294B2 (en) Interleaved n-way set-associative external cache
US20020169935A1 (en) System of and method for memory arbitration using multiple queues
US10198357B2 (en) Coherent interconnect for managing snoop operation and data processing apparatus including the same
US9058283B2 (en) Cache arrangement
US20140040541A1 (en) Method of managing dynamic memory reallocation and device performing the method
CN112445423A (en) Memory system, computer system and data management method thereof
WO2024045586A1 (en) Cache supporting simt architecture and corresponding processor
US9406361B2 (en) Low latency, high bandwidth memory subsystem incorporating die-stacked DRAM
CN114036089B (en) Data processing method and device, buffer, processor and electronic equipment
CN108139983B (en) Method and apparatus for fixing memory pages in multi-level system memory
US9128856B2 (en) Selective cache fills in response to write misses
US6202134B1 (en) Paging processing system in virtual storage device and paging processing method thereof
US20110055482A1 (en) Shared cache reservation
US20200012601A1 (en) Memory system and operating method thereof
WO2020001665A2 (en) On-chip cache and integrated chip
US20190251026A1 (en) Adaptive Computer Cache Architecture
US8812782B2 (en) Memory management system and memory management method

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19825146

Country of ref document: EP

Kind code of ref document: A2