WO2015131395A1 - Cache, shared cache management method and controller - Google Patents

Cache, shared cache management method and controller Download PDF

Info

Publication number
WO2015131395A1
WO2015131395A1 PCT/CN2014/073052 CN2014073052W WO2015131395A1 WO 2015131395 A1 WO2015131395 A1 WO 2015131395A1 CN 2014073052 W CN2014073052 W CN 2014073052W WO 2015131395 A1 WO2015131395 A1 WO 2015131395A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
cache block
priority
cores
block
Prior art date
Application number
PCT/CN2014/073052
Other languages
French (fr)
Chinese (zh)
Inventor
郑礼炳
李景超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201480000331.3A priority Critical patent/CN105359116B/en
Priority to PCT/CN2014/073052 priority patent/WO2015131395A1/en
Publication of WO2015131395A1 publication Critical patent/WO2015131395A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache

Definitions

  • the present invention relates to the field of computers, and in particular, to a buffer, a shared cache management method, and a controller. Background technique
  • a part of the cache block in the shared cache is usually allocated to each core in the multi-core system, and one of the cores in the multi-core system is in the access cache (read or write) miss and cached.
  • the cache block to be replaced is determined in the cache block corresponding to the kernel, and the original data in the cache block to be replaced is replaced with the data to be read or to be written.
  • the inventor has found that at least the following problems exist in the prior art:
  • the kernel can only determine the cache block to be replaced from the corresponding part of the cache block, which is very practical in practical applications. It may happen that some cache blocks corresponding to the kernel are frequently reused, and cache blocks corresponding to other cores are idle for a long time, resulting in low utilization of the shared cache and affecting system performance. Summary of the invention
  • the present invention provides a buffer, a shared cache management method, and a control method, in order to solve the problem that the cache is not high in the prior art, and the cache is not determined by the kernel. Device.
  • the technical solution is as follows:
  • a buffer where the buffer includes:
  • a cache unit a status register, a priority calculation unit, and a controller
  • the buffer unit is respectively connected to the status register and the controller; the status register is respectively connected to the buffer unit and the priority calculation unit; the priority calculation unit and the status register are respectively The controller is connected; the cache unit includes a shared cache and N shadow labels, and the N shadow labels respectively correspond to N cores of the processor; N > 2, and N Is an integer;
  • the status register is configured to record first access information of the N cores to the cache unit, where the first access information includes: accessing the shared cache, occupying a cache block in the shared cache The number of times, the number of times the access to the shared cache is hit, and the number of times the shadow tag was accessed and hit;
  • the priority calculation unit is configured to calculate, according to the first access information of the cache unit, the replacement priorities of the N cores, respectively, according to the N cores recorded in the status register; The priority of the cache block occupied by the corresponding kernel is replaced;
  • the controller configured to acquire, from the priority calculation unit, a replacement priority of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and from the N
  • the cache block to be replaced is determined in the cache block of the shared cache currently occupied by the kernel with the highest priority.
  • the controller is further used to
  • Each of the N cores is assigned a respective target cache occupancy according to a performance goal, the performance target including at least one of overall hit rate maximization, fairness, or quality of service;
  • the status register is further connected to the controller
  • the status register is further configured to record a second access information corresponding to each cache block in the shared cache, where the second access information includes a number of times occupied by the N cores respectively;
  • the controller is configured to: when the accessing the shared cache miss occurs and perform a cache block refill operation, obtain, from the status register, the second corresponding to each cache block currently occupied by a kernel with the highest replacement priority Accessing the information, and determining the cache block to be replaced according to the second access information corresponding to each of the cache blocks currently occupied by the core with the highest priority.
  • the controller In conjunction with the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the controller,
  • the cache block to be replaced is determined from the second type cache block according to a replacement algorithm.
  • the second aspect provides a shared cache management method, which is used in the buffer according to any of the foregoing first aspect or the first aspect, wherein the method includes: When the shared cache misses and performs a cache block refill operation, the respective replacement priorities of the N cores of the processor are obtained from the priority calculation unit; the replacement priority is used to represent the cache occupied by the corresponding kernel. The priority of the block being replaced;
  • a cache block to be replaced is determined from the cached blocks of the shared cache currently occupied by the highest priority kernel.
  • the method includes:
  • Each of the N cores is assigned a respective target cache occupancy according to a performance goal, the performance target including at least one of overall hit rate maximization, fairness, or quality of service;
  • the status register is further configured to record second access information corresponding to each cache block in the shared cache, where the second access information includes the N The number of times the kernels are respectively occupied; the cache block to be replaced is determined from the cache blocks of the shared cache currently occupied by the kernel with the highest priority among the N cores, including:
  • the second access corresponding to each cache block currently occupied by the core with the highest replacement priority is respectively The information determines the cache block to be replaced, including:
  • the cache block to be replaced is determined from the second type cache block according to a replacement algorithm.
  • the third aspect provides a controller, which is used in the buffer according to the foregoing first aspect or any possible implementation manner of the first aspect, wherein the controller includes:
  • a first obtaining module configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation;
  • Level is used to characterize the priority of the cache block occupied by the corresponding kernel being replaced;
  • a determining module configured to determine, from the N cores, a cache block to be replaced from a cache block of a shared cache currently occupied by a kernel with the highest priority.
  • the controller includes:
  • An allocation module configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of overall hit rate maximization, fairness, or quality of service;
  • the detecting module is configured to detect whether the actual cache occupancy is not greater than the target cache occupancy;
  • control module configured to: if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, control the priority calculation unit to recalculate respective replacement priorities of the N cores.
  • the determining module includes:
  • an obtaining unit configured to acquire, from the status register, second access information corresponding to each cache block currently occupied by the core with the highest replacement priority
  • a determining unit configured to determine, according to the second access information corresponding to each cache block that is currently occupied by the kernel with the highest replacement priority, the cache block to be replaced;
  • the status register is further configured to record the second access information corresponding to each cache block in the shared cache, where the second access information includes a number of times occupied by the N cores.
  • the determining unit includes:
  • a first determining subunit configured to determine a first type of cache block from a cache block currently occupied by the core with the highest replacement priority, where the first type cache block is occupied by a kernel with the highest replacement priority The least used cache block;
  • a second determining subunit configured to determine a second type of cache block from the first type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores;
  • a third determining subunit configured to determine, according to the replacement algorithm, the cache block to be replaced from the second type cache block.
  • the first access information of each of the N cores to the cache unit is recorded by the status register, and the priority calculation unit calculates the replacement priority of each of the N cores according to the first access information recorded by the status register, and the controller preferentially replaces each of the N cores according to the replacement status of the N cores.
  • the level of the cache block to be replaced in the shared cache is determined, which solves the problem that the kernel can only determine the cache cache to be replaced from the corresponding part of the cache block, and the utilization of the shared cache is not high, thereby improving the utilization of the shared cache. The effect of rate and system performance.
  • FIG. 1 is a schematic structural diagram of a buffer provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a buffer according to another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a cache unit according to another embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a status register according to another embodiment of the present invention.
  • FIG. 5 is a flowchart of a method for a shared cache management method according to an embodiment of the present invention
  • FIG. 6 is a flowchart of a method for a shared cache management method according to another embodiment of the present invention
  • FIG. 7 is a flowchart of an embodiment of the present invention. Schematic diagram of the device
  • FIG. 8 is a schematic structural diagram of a controller according to another embodiment of the present invention. detailed description
  • FIG. 1 is a schematic structural diagram of a buffer provided by an embodiment of the present invention.
  • the buffer may include: a cache unit 102, a status register The device 104, the priority calculation unit 106 and the controller 108;
  • the buffer unit 102 is respectively connected to the status register 104 and the controller 108; the status register 104 is respectively connected to the buffer unit 102 and the priority calculation unit 106; the priority calculation unit 106 Connected to the status register 104 and the controller 108 respectively; the cache unit 102 includes a shared cache 1022 and N shadow labels 1024, respectively, corresponding to N cores of the processor; N > 2, and N is an integer;
  • the status register 104 is configured to record first access information of the N cores to the cache unit 102, where the first access information includes: accessing the shared cache 1022, occupying the shared cache 1022 The number of cache blocks in the cache, the number of hits to the shared cache 1022 and hits, and the number of hits to the shadow tag 1024 and hits;
  • the priority calculation unit 106 is configured to calculate, according to the first access information of the cache unit 102, the replacement priorities of the N cores according to the N cores recorded by the status register 104; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel being replaced;
  • the controller 108 is configured to acquire, from the priority calculation unit, a replacement priority of each of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and The cache block to be replaced is determined among the N cores in the cache block of the shared cache currently occupied by the core with the highest priority.
  • the status register records the access information of each core to the shared cache
  • the priority calculation unit calculates the replacement priority of each core according to the access information of each core to the shared cache
  • the controller performs the cache block refill operation.
  • the cache block to be replaced is determined according to the replacement priority of each core, and the cache block to be replaced can be determined according to the actual access situation of each core to the shared cache, thereby improving the utilization of the shared cache.
  • the buffer provided by the embodiment of the present invention records the first access information of each of the N cores to the cache unit through the status register, and the priority calculation unit calculates the respective N cores according to the first access information recorded in the status register.
  • the controller determines the cache block to be replaced in the shared cache according to the replacement priority of each of the N cores, which solves the problem that the kernel can only determine the cache block to be replaced from the corresponding part of the cache block in the prior art.
  • the problem of low shared cache utilization is achieved by improving shared cache utilization and system performance.
  • FIG. 2 is a schematic structural diagram of a buffer provided by another embodiment of the present invention.
  • the Buffers can be applied to multi-core systems.
  • the buffer may include: a cache unit 202, a status register 204, a priority calculation unit 206, and a controller 208;
  • the buffer unit 202 is respectively connected to the status register 204 and the controller 208; the status register 204 is respectively connected to the buffer unit 202 and the priority calculation unit 206; the priority calculation unit 206 Connected to the status register 204 and the controller 208, respectively; the cache unit 202 includes a shared cache 2022 and N shaded labels 2024, respectively, corresponding to N cores of the processor; N > 2, and N is an integer;
  • the processor includes four cores.
  • the cache unit includes a shared cache and four shadow labels
  • the shared cache includes n cache blocks.
  • Each shadow label also contains n storage units, and each cache block in the shared cache corresponds to one of each of the shadow labels.
  • Each cache block in the shared cache is divided into four parts: the kernel ID (identity, identification number), valid identification, tag information, and data currently occupying the cache block; each storage unit in the shadow tag includes two parts : Valid identification and label information.
  • the address of the corresponding memory block and the storage unit of each shadow tag is determined by the lower address of the 64-bit address, and the valid address is extracted from the determined address.
  • the identification and label information if the extracted valid identifier is valid, and the label information matches the information contained in the upper address of the 64-bit address, the current access hit is determined, otherwise, the current access miss is determined.
  • the status register 204 is configured to record first access information of the N cores to the cache unit 202, where the first access information includes: accessing the shared cache 2022, occupying the shared cache 2022 The number of cache blocks in the access buffer, access to the shared cache 2022 and the hit time when the kernel accesses the shared cache, the status register can record the kernel's access hits to the shared cache and shadow tags by the counter in real time, in the state shown in FIG.
  • the configuration of the register can maintain four counters for each core, which are the access counter (CNT_Ac), the buffer occupancy counter (CNT_Size), the cache hit counter (Cnt_Shared Hit), and The tag hit counter (Cnt_Shadow Hit), where CNT_Ac corresponds to the number of times a certain kernel accesses the shared cache, CNT_Size corresponds to the number of cache blocks occupied by a certain kernel in the shared cache, and Cnt_Shared Hit corresponds to a certain kernel. The number of times the access cache is hit and hit, Cnt-Shadow Hit corresponds to a kernel access shadow The number of hits and hits. Take the kernel i access shared cache and shadow tags as an example.
  • each counter is as follows: When the processor core i accesses the shared cache (read operation or write operation), if the shared cache is hit, Cnt_Ac(i) and Cnt_Shared Hit(i) are incremented by 1. If the shadow label is hit, Cnt — Shadow Hit(i) is incremented by one; if the shared cache is missed, Cnt_Ac(i) and Cnt_Size(i) are incremented by 1 when there is a free cache block. If there is no free cache block, then The Cnt_Ac(i) counter is incremented by 1, and the kernel A with the highest priority is replaced.
  • the cache block to be replaced is determined from the cache block occupied by the kernel A and Write Back is executed. Operation, and decrement the value of Cnt_Size(A) by 1. If kernel A is kernel i, the value of Cnt_Size(A) remains unchanged; if the shadow label is missing, Cnt- Shadow Hit(i) constant.
  • the priority calculation unit 206 is configured to calculate, according to the first access information of the cache unit 202, the replacement priorities of the N cores according to the N cores recorded by the status register 204; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel being replaced;
  • the priority calculation unit 206 may calculate the replacement probability of each kernel according to the first access information and the replacement probability calculation model of each of the above-mentioned kernels, and form a probability distribution, wherein the core with the highest replacement probability has the highest replacement priority.
  • the replacement probability calculation model is as follows:
  • the number of access misses in any interval is W. And within these W access misses, the proportion of kernel i is Mi. Then, within this interval, the number of access misses in kernel i is Mi xW. If the cache block occupied by kernel i at the beginning of this interval is not replaced within this interval, at the end of the interval, the cache occupancy ratio of kernel i will become (Ci+(MiX W/m)), where m is The total number of cache blocks; Q is the ratio of the number of cache blocks occupied by kernel i at the beginning of this interval.
  • the controller 208 is configured to acquire, from the priority calculation unit, a replacement priority of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and The cache block to be replaced is determined among the N cores in the cache block of the shared cache currently occupied by the core with the highest priority.
  • the controller may determine the cache block to be replaced from the cache block of the shared cache currently occupied by the core with the highest priority replacement according to the LRU (Least Recently Used) replacement algorithm.
  • the controller may determine the cache block to be replaced according to other replacement algorithms, which is not specifically limited in this embodiment.
  • the controller 208 is further configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of overall hit ratio maximization, fairness, or quality of service; The actual cache occupancy of the kernel with the highest priority; detecting whether the actual cache occupancy is greater than the target cache occupancy; if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, then controlling the The priority calculation unit recalculates the respective replacement priorities of the N cores.
  • the controller may first allocate a target cache occupancy amount to each core according to performance targets (such as overall hit rate maximization, fairness, or quality of service, etc.), and the controller further acquires the first access information from the cache unit, and from the first The actual cache occupancy of the kernel with the highest priority is obtained in the access information (the actual cache occupancy of the kernel with the highest priority substitution can be determined according to the number of cache blocks in the shared cache occupied by the kernel with the highest priority).
  • performance targets such as overall hit rate maximization, fairness, or quality of service, etc.
  • the controller may send a control instruction to the priority calculation unit to control the priority.
  • the calculation unit recalculates the corresponding replacement of the N cores priority.
  • the status register 204 is further connected to the controller 208.
  • the status register 204 is further configured to record second access information corresponding to each cache block in the shared cache 2022, where the second access information includes The number of times the N cores are occupied respectively;
  • the controller 208 is configured to: when the accessing the shared cache miss occurs and perform a cache block refill operation, obtain, from the status register 204, each of the currently occupied cores with the highest priority And storing the second access information corresponding to each of the cache blocks, and determining the cache block to be replaced according to the second access information corresponding to each of the cache blocks currently occupied by the kernel with the highest replacement priority.
  • the controller 208 is specifically configured to determine, according to a buffer block currently occupied by the core with the highest replacement priority, the first type cache block, where the first type cache block is the core with the highest replacement priority. a cache block having the least number of occupations; determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; Determining the cache block to be replaced from the second type of cache block.
  • each cache block is used on average to avoid the reused locality of the cache block.
  • This embodiment also determines the cache block to be replaced in combination with the number of times each cache block is occupied. Specifically, the status register records the number of times each buffer block in the shared cache is occupied by each core as the second access information corresponding to each cache block, and when the subsequent kernel accesses the shared cache miss and performs the cache block refill, first Determining, from the cache block currently occupied by the kernel with the highest priority, the cache block with the least number of cores with the highest replacement priority, and determining that it is occupied by all cores from the cache block with the least number of cores with the highest replacement priority.
  • the cache block with the least total number of times, and finally the cache block to be replaced is determined from the cache block with the least total number of times occupied by all the kernels according to a replacement algorithm such as LUR.
  • the above method can tend to select the cache block with the least number of reuses, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores.
  • the buffer provided by the embodiment of the present invention records the first access information of each of the N cores to the cache unit through the status register, and the priority calculation unit calculates the respective N cores according to the first access information recorded in the status register.
  • the controller determines the cache block to be replaced in the shared cache according to the replacement priority of each of the N cores, which solves the problem that the kernel can only determine the cache block to be replaced from the corresponding part of the cache block in the prior art.
  • the problem of low shared cache utilization is achieved by improving shared cache utilization and system performance.
  • the buffer provided by the embodiment of the present invention detects the size relationship between the actual cache occupancy of the kernel with the highest priority and the target cache occupancy of the kernel with the highest priority, and the actual cache occupancy is not greater than the target.
  • the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.
  • the buffer determines the first type of cache block by using the cache block currently occupied by the kernel with the highest replacement priority, and the first type cache block is the least occupied by the kernel with the highest priority.
  • a cache block determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores;
  • the cache block to be replaced is determined in the type cache block, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, thereby further improving the utilization of each cache block in the shared cache.
  • FIG. 5 is a flowchart of a method for sharing a cache management method according to an embodiment of the present invention.
  • the method can control access to the kernel in a buffer as shown in FIG. 1 or 2.
  • the shared cache management method can include:
  • Step 302 Acquire a replacement priority of each of the N cores of the processor from the priority calculation unit when the access to the shared cache miss occurs and perform a cache block refill operation; the replacement priority is used to represent the corresponding kernel The priority of the occupied cache block being replaced;
  • Step 304 Determine, from the N cores, a cache block to be replaced from a cache block of a shared cache currently occupied by a kernel with the highest priority.
  • the shared cache management method determines the replacement priority of the N cores from the priority calculation unit, and determines the cache block to be replaced in the shared cache according to the replacement priorities of the N cores.
  • the problem that the kernel cache can only determine the cache cache to be replaced from the corresponding part of the cache block is not high, and the effect of improving the shared cache utilization and system performance is achieved.
  • FIG. 6, is a flowchart of a method for sharing a cache management method according to another embodiment of the present invention. This method can control access to the kernel in a buffer as shown in Figure 1 or Figure 2.
  • the shared cache management method can include:
  • Step 402 Obtain a replacement priority of each of the N cores of the processor from the priority computing unit when accessing the shared cache miss occurs and performing a cache block refill operation;
  • the replacement priority is used to indicate the priority of the cache block occupied by the corresponding kernel to be replaced.
  • the replacement priority is calculated by the priority calculation unit according to the first access information corresponding to each core recorded in the status register, the specific process of recording the first access information by the status register, and the priority calculation unit calculating the priority of each core.
  • the process of the process please refer to the description in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 404 Obtain, from the status register, the second access information corresponding to each cache block currently occupied by the core with the highest priority;
  • the status register is also used to record the second visit corresponding to each cache block in the shared cache.
  • the information is information, and the second access information includes the number of times occupied by the N cores.
  • Step 406 Determine, according to the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority, the cache block to be replaced.
  • the controller may determine, from a cache block currently occupied by the core with the highest replacement priority, the first type cache block being the cache block with the least number of times occupied by the kernel with the highest replacement priority; Determining a second type of cache block in a type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; determining a cache block to be replaced from the second type of cache block according to a replacement algorithm .
  • each cache block is used on average to avoid the reused locality of the cache block.
  • This embodiment also determines the cache block to be replaced in combination with the number of times each cache block is occupied. Specifically, the status register records the number of times each cache block in the shared cache is occupied by each core as the second access information corresponding to each cache block.
  • the subsequent kernel accesses the shared cache miss and performs the cache block refill, the first The cache block currently occupied by the highest priority kernel is determined to be the cache block with the least number of cores occupied by the highest priority, and the cache block that has the least number of kernels with the highest priority is determined to be occupied by all the cores.
  • the cache block with the least total number of times, and finally the cache block to be replaced is determined from the cache block with the least total number of times occupied by all the kernels according to a replacement algorithm such as LUR.
  • the controller allocates a respective target cache occupancy for the N cores according to performance goals, the performance target including at least one of overall hit rate maximization, fairness, or quality of service; obtaining a kernel with the highest replacement priority
  • the actual cache occupancy is detected; if the actual cache occupancy is not greater than the target cache occupancy; if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, then the priority calculation unit is controlled to recalculate the N cores.
  • Respective replacement priorities are provided to determine the priority of the priority of the priority.
  • the controller may first allocate a target cache occupancy amount to each core according to performance targets (such as overall hit rate maximization, fairness, or quality of service, etc.), and the controller further acquires the first access information from the cache unit, and from the first The actual cache occupancy of the kernel with the highest priority is obtained in the access information (the actual cache occupancy of the kernel with the highest priority substitution can be determined according to the number of cache blocks in the shared cache occupied by the kernel with the highest priority).
  • performance targets such as overall hit rate maximization, fairness, or quality of service, etc.
  • the controller may send a control instruction to the priority calculation unit to control the priority calculation unit to recalculate the replacement priority corresponding to each of the N cores. .
  • the shared cache management method determines the replacement priority of the N cores from the priority calculation unit, and determines the cache block to be replaced in the shared cache according to the replacement priorities of the N cores.
  • the problem that the kernel cache can only determine the cache cache to be replaced from the corresponding part of the cache block is not high, and the effect of improving the shared cache utilization and system performance is achieved.
  • the shared cache management method detects the size relationship between the actual cache occupancy of the kernel with the highest priority and the target cache occupancy of the kernel with the highest priority, when the actual cache occupancy is not When the target cache occupancy is greater than, the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.
  • the shared cache management method determines the first type of cache block by using the cache block currently occupied by the kernel with the highest priority, and the first type of cache block is the number of times the core with the highest priority is replaced. a minimum number of cache blocks; determining a second type of cache block from the first type of cache block, the second type of cache block being the least number of cache blocks occupied by the N cores; determining from the second type of cache block according to the replacement algorithm
  • the replacement cache block takes into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, further improving the utilization of each cache block in the shared cache.
  • FIG. 7, is a schematic structural diagram of a controller according to an embodiment of the present invention.
  • the controller can control access to the kernel in a buffer as shown in Figure 1 or Figure 2.
  • the controller can include:
  • a first obtaining module 501 configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation;
  • the priority is used to indicate the priority of the cache block occupied by the corresponding kernel is replaced;
  • the determining module 502 is configured to determine, from the N cores, the cache block of the shared cache currently occupied by the highest priority kernel to be replaced. Cache block.
  • the controller provided by the embodiment of the present invention obtains the replacement priorities of the N cores from the priority calculation unit, and determines the cache blocks to be replaced in the shared cache according to the replacement priorities of the N cores.
  • the kernel can only determine that it is to be replaced from a corresponding part of the cache block.
  • the problem of low utilization of the shared cache caused by the cache block achieves the effect of improving the shared cache utilization and system performance.
  • FIG. 8 is a schematic structural diagram of a controller according to another embodiment of the present invention.
  • the controller can control access to the kernel in a buffer as shown in FIG. 1 or 2.
  • the controller can include:
  • a first obtaining module 601 configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation;
  • the priority is used to indicate the priority of the cache block occupied by the corresponding kernel is replaced;
  • the determining module 602 is configured to determine, from the N cores, the cache block of the shared cache currently occupied by the highest priority kernel to be replaced. Cache block.
  • the controller includes:
  • the allocating module 603 is configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of an overall hit rate maximization, fairness, or quality of service;
  • the second obtaining module 604 is configured to obtain an actual cache occupancy of the kernel with the highest priority; the detecting module 605 is configured to detect whether the actual cache occupancy is not greater than the target cache usage;
  • the control module 606 is configured to, if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, control the priority calculation unit to recalculate respective replacement priorities of the N cores.
  • the determining module 602 includes:
  • the obtaining unit 6021 is configured to obtain, from the status register, second access information corresponding to each of the cache blocks currently occupied by the kernel with the highest priority;
  • the determining unit 6022 is configured to determine, according to the second access information corresponding to each cache block currently occupied by the kernel with the highest replacement priority, the cache block to be replaced;
  • the status register is further configured to record the second access information corresponding to each cache block in the shared cache, where the second access information includes a number of times occupied by the N cores.
  • the determining unit 6022 includes:
  • a first determining sub-unit 6022a configured to use the current highest priority from the core with the highest replacement priority Determining a first type of cache block in the storage block, the first type of cache block being a cache block having the least number of times occupied by the core with the highest replacement priority;
  • a second determining sub-unit 6022b configured to determine, from the first type of cache block, a second type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores;
  • the determining subunit 6022c is configured to determine, according to the replacement algorithm, the cache block to be replaced from the second type cache block.
  • the controller provided by the embodiment of the present invention obtains the replacement priorities of the N cores from the priority calculation unit, and determines the cache blocks to be replaced in the shared cache according to the replacement priorities of the N cores.
  • the kernel can only determine the cache cache to be replaced from the corresponding part of the cache block, and the utilization of the shared cache is not high, thereby achieving the effect of improving the shared cache utilization and system performance.
  • the controller provided by the embodiment of the present invention detects the size relationship between the actual cache occupancy of the core with the highest priority and the target cache occupancy of the core with the highest priority, and the actual cache occupancy is not greater than the target.
  • the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.
  • the controller determines the first type of cache block by using the cache block currently occupied by the core with the highest priority, and the first type of cache block is the least occupied by the core with the highest priority.
  • a cache block determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; determining, to be replaced, from the second type of cache block according to a replacement algorithm Cache blocks, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, further improving the utilization of each cache block in the shared cache.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Provided are a cache, a shared cache management method and a controller, which relate to the field of computers. The cache comprises a cache unit, a state register, a priority calculation unit and a controller, wherein the state register is used for recording first access information about N kernels for respectively accessing the cache unit; the priority calculation unit is used for calculating the respective replacement priorities of the N kernels according to the first access information; and the controller is used for determining a cache block to be replaced in the cache blocks in a shared cache currently occupied by the kernel which has the highest replacement priority. The respective replacement priorities of the N kernels are calculated according to the first access information recorded by the state register, and the cache block to be replaced in the shared cache is determined according to the replacement priorities, thereby solving the problem in the prior art that a kernel can only determine a cache block to be replaced in a part of the corresponding cache blocks, so that the effect of improving the utilization rate and system performance of the shared cache is achieved.

Description

緩存器、 共享緩存管理方法及控制器 技术领域  Buffer, shared cache management method and controller
本发明涉及计算机领域, 特别涉及一种緩存器、 共享緩存管理方法及控制 器。 背景技术  The present invention relates to the field of computers, and in particular, to a buffer, a shared cache management method, and a controller. Background technique
随着计算机领域的不断发展, 多核处理器的应用越来越广泛, 能否有效管 理共享高速緩存对于系统的性能会产生重要的影响。  With the continuous development of the computer field, the application of multi-core processors is more and more widely, and the effective management of the shared cache will have an important impact on the performance of the system.
在现有的共享緩存管理方案中,通常为多核系统中的每个内核分配共享緩 存中的一部分緩存块, 多核系统中的某一个内核在访问緩存(读取或者写入) 未命中并进行緩存块重填操作时, 在该内核对应的緩存块中确定待替换的緩存 块, 并用待读取或者待写入的数据替换该待替换的緩存块中原有的数据。  In the existing shared cache management scheme, a part of the cache block in the shared cache is usually allocated to each core in the multi-core system, and one of the cores in the multi-core system is in the access cache (read or write) miss and cached. During the block refill operation, the cache block to be replaced is determined in the cache block corresponding to the kernel, and the original data in the cache block to be replaced is replaced with the data to be read or to be written.
在实现本发明的过程中, 发明人发现现有技术至少存在以下问题: 现有的共享緩存管理方案中, 内核只能从对应的一部分緩存块中确定待替 换的緩存块, 在实际应用中很可能出现一些内核对应的緩存块被频繁重用而其 它内核对应的緩存块则长时间空闲的情形, 导致共享緩存利用率不高, 影响系 统性能。 发明内容  In the process of implementing the present invention, the inventor has found that at least the following problems exist in the prior art: In the existing shared cache management scheme, the kernel can only determine the cache block to be replaced from the corresponding part of the cache block, which is very practical in practical applications. It may happen that some cache blocks corresponding to the kernel are frequently reused, and cache blocks corresponding to other cores are idle for a long time, resulting in low utilization of the shared cache and affecting system performance. Summary of the invention
为了解决现有技术中内核只能从对应的一部分緩存块中确定待替换的緩 存块而导致的緩存利用率不高的问题, 本发明实施例提供了一种緩存器、 共享 緩存管理方法及控制器。 所述技术方案如下:  The present invention provides a buffer, a shared cache management method, and a control method, in order to solve the problem that the cache is not high in the prior art, and the cache is not determined by the kernel. Device. The technical solution is as follows:
第一方面, 提供了一种緩存器, 所述緩存器包括:  In a first aspect, a buffer is provided, where the buffer includes:
緩存单元、 状态寄存器、 优先级计算单元和控制器;  a cache unit, a status register, a priority calculation unit, and a controller;
所述緩存单元分别与所述状态寄存器和所述控制器相连接; 所述状态寄存 器分别与所述緩存单元和所述优先级计算单元连接; 所述优先级计算单元分别 与所述状态寄存器和所述控制器相连接;所述緩存单元包括共享緩存以及 N个 阴影标签, 所述 N个阴影标签分别与处理器的 N个内核相对应; N > 2, 且 N 为整数; The buffer unit is respectively connected to the status register and the controller; the status register is respectively connected to the buffer unit and the priority calculation unit; the priority calculation unit and the status register are respectively The controller is connected; the cache unit includes a shared cache and N shadow labels, and the N shadow labels respectively correspond to N cores of the processor; N > 2, and N Is an integer;
所述状态寄存器,用于记录所述 N个内核各自对所述緩存单元的第一访问 信息, 所述第一访问信息包括: 访问所述共享緩存的次数、 占用所述共享緩存 中的緩存块的数量、访问所述共享緩存并命中的次数以及访问所述阴影标签并 命中的次数;  The status register is configured to record first access information of the N cores to the cache unit, where the first access information includes: accessing the shared cache, occupying a cache block in the shared cache The number of times, the number of times the access to the shared cache is hit, and the number of times the shadow tag was accessed and hit;
所述优先级计算单元,用于根据所述状态寄存器记录的所述 N个内核各自 对所述緩存单元的第一访问信息计算所述 N个内核各自的替换优先级;所述替 换优先级用于表征对应的内核所占用的緩存块被替换的优先程度;  The priority calculation unit is configured to calculate, according to the first access information of the cache unit, the replacement priorities of the N cores, respectively, according to the N cores recorded in the status register; The priority of the cache block occupied by the corresponding kernel is replaced;
所述控制器, 用于在发生访问所述共享緩存未命中并进行緩存块重填操作 时,从所述优先级计算单元中获取所述 N个内核各自的替换优先级, 并从所述 N 个内核中替换优先级最高的内核当前占用的共享緩存的緩存块中确定待替 换的緩存块。  The controller, configured to acquire, from the priority calculation unit, a replacement priority of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and from the N The cache block to be replaced is determined in the cache block of the shared cache currently occupied by the kernel with the highest priority.
在第一方面的第一种可能实现方式中, 所述控制器, 还用于  In a first possible implementation manner of the first aspect, the controller is further used to
根据性能目标为所述 N个内核分配各自的目标緩存占用量,所述性能目标 包括整体命中率最大化、 公平性或者服务质量中的至少一种;  Each of the N cores is assigned a respective target cache occupancy according to a performance goal, the performance target including at least one of overall hit rate maximization, fairness, or quality of service;
获取替换优先级最高的内核的实际緩存占用量;  Get the actual cache footprint of the kernel with the highest priority;
检测所述实际緩存占用量是否不大于所述目标緩存占用量;  Detecting whether the actual cache occupancy is not greater than the target cache occupancy;
若检测结果为所述实际緩存占用量不大于所述目标緩存占用量, 则控制所 述优先级计算单元重新计算所述 N个内核各自的替换优先级。  And if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, controlling the priority calculation unit to recalculate respective replacement priorities of the N cores.
在第一方面的第二种可能实现方式中, 所述状态寄存器还与所述控制器相 连;  In a second possible implementation manner of the first aspect, the status register is further connected to the controller;
所述状态寄存器,还用于记录所述共享緩存中的各个緩存块各自对应的第 二访问信息, 所述第二访问信息包括被所述 N个内核分别占用的次数;  The status register is further configured to record a second access information corresponding to each cache block in the shared cache, where the second access information includes a number of times occupied by the N cores respectively;
所述控制器, 用于在发生访问所述共享緩存未命中并进行緩存块重填操作 时,从所述状态寄存器获取替换优先级最高的内核当前占用的各个緩存块各自 对应的所述第二访问信息, 并根据所述替换优先级最高的内核当前占用的各个 緩存块各自对应的所述第二访问信息确定所述待替换的緩存块。  The controller is configured to: when the accessing the shared cache miss occurs and perform a cache block refill operation, obtain, from the status register, the second corresponding to each cache block currently occupied by a kernel with the highest replacement priority Accessing the information, and determining the cache block to be replaced according to the second access information corresponding to each of the cache blocks currently occupied by the core with the highest priority.
结合第一方面的第二种可能实现方式,在第一方面的第三种可能实现方式 中, 所述控制器, 用于  In conjunction with the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the controller,
从所述替换优先级最高的内核当前占用的緩存块中确定第一类型緩存块, 所述第一类型緩存块为被所述替换优先级最高的内核占用的次数最少的緩存 块; Determining a first type of cache block from a cache block currently occupied by the core with the highest replacement priority, the first type of cache block being a cache with the least number of times of the core with the highest replacement priority Piece;
从所述第一类型緩存块中确定第二类型緩存块, 所述第二类型緩存块为被 所述 N个内核占用的总次数最少的緩存块;  Determining, from the first type of cache block, a second type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores;
根据替换算法从所述第二类型緩存块中确定所述待替换的緩存块。  The cache block to be replaced is determined from the second type cache block according to a replacement algorithm.
第二方面, 提供了一种共享緩存管理方法, 用于如上述第一方面或者第一 方面的任一可能实现方式所述的緩存器中, 其特征在于, 所述方法包括: 在发生访问所述共享緩存未命中并进行緩存块重填操作时,从所述优先级 计算单元中获取处理器的 N个内核各自的替换优先级;所述替换优先级用于表 征对应的内核所占用的緩存块被替换的优先程度;  The second aspect provides a shared cache management method, which is used in the buffer according to any of the foregoing first aspect or the first aspect, wherein the method includes: When the shared cache misses and performs a cache block refill operation, the respective replacement priorities of the N cores of the processor are obtained from the priority calculation unit; the replacement priority is used to represent the cache occupied by the corresponding kernel. The priority of the block being replaced;
从所述 N 个内核中替换优先级最高的内核当前占用的共享緩存的緩存块 中确定待替换的緩存块。  A cache block to be replaced is determined from the cached blocks of the shared cache currently occupied by the highest priority kernel.
在第二方面的第一种可能实现方式中, 所述方法包括:  In a first possible implementation manner of the second aspect, the method includes:
根据性能目标为所述 N个内核分配各自的目标緩存占用量,所述性能目标 包括整体命中率最大化、 公平性或者服务质量中的至少一种;  Each of the N cores is assigned a respective target cache occupancy according to a performance goal, the performance target including at least one of overall hit rate maximization, fairness, or quality of service;
获取替换优先级最高的内核的实际緩存占用量;  Get the actual cache footprint of the kernel with the highest priority;
检测所述实际緩存占用量是否不大于所述目标緩存占用量;  Detecting whether the actual cache occupancy is not greater than the target cache occupancy;
若检测结果为所述实际緩存占用量不大于所述目标緩存占用量, 则控制所 述优先级计算单元重新计算所述 N个内核各自的替换优先级。  And if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, controlling the priority calculation unit to recalculate respective replacement priorities of the N cores.
在第二方面的第二种可能实现方式中, 所述状态寄存器还用于记录所述共 享緩存中的各个緩存块各自对应的第二访问信息, 所述第二访问信息包括被所 述 N个内核分别占用的次数; 所述从所述 N个内核中替换优先级最高的内核 当前占用的共享緩存的緩存块中确定待替换的緩存块, 包括:  In a second possible implementation manner of the second aspect, the status register is further configured to record second access information corresponding to each cache block in the shared cache, where the second access information includes the N The number of times the kernels are respectively occupied; the cache block to be replaced is determined from the cache blocks of the shared cache currently occupied by the kernel with the highest priority among the N cores, including:
从所述状态寄存器获取替换优先级最高的内核当前占用的各个緩存块各 自对应的所述第二访问信息;  Obtaining, from the status register, the second access information corresponding to each cache block currently occupied by the core with the highest priority;
根据所述替换优先级最高的内核当前占用的各个緩存块各自对应的所述 第二访问信息确定所述待替换的緩存块。  Determining the cache block to be replaced according to the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority.
结合第二方面的第二种可能实现方式,在第二方面的第三种可能实现方式 中,所述根据所述替换优先级最高的内核当前占用的各个緩存块各自对应的所 述第二访问信息确定所述待替换的緩存块, 包括:  With reference to the second possible implementation of the second aspect, in a third possible implementation manner of the second aspect, the second access corresponding to each cache block currently occupied by the core with the highest replacement priority is respectively The information determines the cache block to be replaced, including:
从所述替换优先级最高的内核当前占用的緩存块中确定第一类型緩存块, 所述第一类型緩存块为被所述替换优先级最高的内核占用的次数最少的緩存 块; Determining a first type of cache block from a cache block currently occupied by the core with the highest replacement priority, the first type of cache block being a cache with the least number of times of the core with the highest replacement priority Piece;
从所述第一类型緩存块中确定第二类型緩存块, 所述第二类型緩存块为被 所述 N个内核占用的总次数最少的緩存块;  Determining, from the first type of cache block, a second type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores;
根据替换算法从所述第二类型緩存块中确定所述待替换的緩存块。  The cache block to be replaced is determined from the second type cache block according to a replacement algorithm.
第三方面, 提供了一种控制器, 用于如上述第一方面或者第一方面的任意 可能实现方式所述的緩存器中, 其特征在于, 所述控制器包括:  The third aspect provides a controller, which is used in the buffer according to the foregoing first aspect or any possible implementation manner of the first aspect, wherein the controller includes:
第一获取模块, 用于在发生访问所述共享緩存未命中并进行緩存块重填操 作时,从所述优先级计算单元中获取处理器的 N个内核各自的替换优先级; 所 述替换优先级用于表征对应的内核所占用的緩存块被替换的优先程度;  a first obtaining module, configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation; Level is used to characterize the priority of the cache block occupied by the corresponding kernel being replaced;
确定模块,用于从所述 N个内核中替换优先级最高的内核当前占用的共享 緩存的緩存块中确定待替换的緩存块。  And a determining module, configured to determine, from the N cores, a cache block to be replaced from a cache block of a shared cache currently occupied by a kernel with the highest priority.
在第三方面的第一种可能实现方式中, 所述控制器包括:  In a first possible implementation manner of the third aspect, the controller includes:
分配模块, 用于根据性能目标为所述 N 个内核分配各自的目标緩存占用 量,所述性能目标包括整体命中率最大化、公平性或者服务质量中的至少一种; 第二获取模块, 用于获取替换优先级最高的内核的实际緩存占用量; 检测模块, 用于检测所述实际緩存占用量是否不大于所述目标緩存占用 量;  An allocation module, configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of overall hit rate maximization, fairness, or quality of service; The detecting module is configured to detect whether the actual cache occupancy is not greater than the target cache occupancy;
控制模块, 用于若检测结果为所述实际緩存占用量不大于所述目标緩存占 用量, 则控制所述优先级计算单元重新计算所述 N个内核各自的替换优先级。  And a control module, configured to: if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, control the priority calculation unit to recalculate respective replacement priorities of the N cores.
在第三方面的第二种可能实现方式中, 所述确定模块, 包括:  In a second possible implementation manner of the third aspect, the determining module includes:
获取单元, 用于从所述状态寄存器获取替换优先级最高的内核当前占用的 各个緩存块各自对应的第二访问信息;  And an obtaining unit, configured to acquire, from the status register, second access information corresponding to each cache block currently occupied by the core with the highest replacement priority;
确定单元, 用于根据所述替换优先级最高的内核当前占用的各个緩存块各 自对应的所述第二访问信息确定所述待替换的緩存块;  a determining unit, configured to determine, according to the second access information corresponding to each cache block that is currently occupied by the kernel with the highest replacement priority, the cache block to be replaced;
其中, 所述状态寄存器还用于记录所述共享緩存中的各个緩存块各自对应 的所述第二访问信息, 所述第二访问信息包括被所述 N 个内核分别占用的次 数。  The status register is further configured to record the second access information corresponding to each cache block in the shared cache, where the second access information includes a number of times occupied by the N cores.
结合第三方面的第二种可能实现方式,在第三方面的第三种可能实现方式 中, 所述确定单元, 包括:  With reference to the second possible implementation of the third aspect, in a third possible implementation manner of the third aspect, the determining unit includes:
第一确定子单元, 用于从所述替换优先级最高的内核当前占用的緩存块中 确定第一类型緩存块, 所述第一类型緩存块为被所述替换优先级最高的内核占 用的次数最少的緩存块; a first determining subunit, configured to determine a first type of cache block from a cache block currently occupied by the core with the highest replacement priority, where the first type cache block is occupied by a kernel with the highest replacement priority The least used cache block;
第二确定子单元, 用于从所述第一类型緩存块中确定第二类型緩存块, 所 述第二类型緩存块为被所述 N个内核占用的总次数最少的緩存块;  a second determining subunit, configured to determine a second type of cache block from the first type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores;
第三确定子单元, 用于根据替换算法从所述第二类型緩存块中确定所述待 替换的緩存块。  And a third determining subunit, configured to determine, according to the replacement algorithm, the cache block to be replaced from the second type cache block.
本发明实施例提供的技术方案的有益效果是:  The beneficial effects of the technical solutions provided by the embodiments of the present invention are:
通过状态寄存器记录 N个内核各自对緩存单元的第一访问信息,优先级计 算单元根据状态寄存器记录的第一访问信息计算 N个内核各自的替换优先级, 控制器根据 N个内核各自的替换优先级确定共享緩存中待替换的緩存块,解决 了现有技术中内核只能从对应的一部分緩存块中确定待替换的緩存块而导致 的共享緩存利用率不高的问题, 达到提高共享緩存利用率和系统性能的效果。 附图说明  The first access information of each of the N cores to the cache unit is recorded by the status register, and the priority calculation unit calculates the replacement priority of each of the N cores according to the first access information recorded by the status register, and the controller preferentially replaces each of the N cores according to the replacement status of the N cores. The level of the cache block to be replaced in the shared cache is determined, which solves the problem that the kernel can only determine the cache cache to be replaced from the corresponding part of the cache block, and the utilization of the shared cache is not high, thereby improving the utilization of the shared cache. The effect of rate and system performance. DRAWINGS
为了更清楚地说明本发明实施例中的技术方案, 下面将对实施例描述中所 需要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明 的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.
图 1是本发明一个实施例提供的緩存器的结构示意图;  1 is a schematic structural diagram of a buffer provided by an embodiment of the present invention;
图 2是本发明另一实施例提供的緩存器的结构示意图;  2 is a schematic structural diagram of a buffer according to another embodiment of the present invention;
图 3是本发明另一实施例提供的緩存单元的构成示意图;  3 is a schematic structural diagram of a cache unit according to another embodiment of the present invention;
图 4是本发明另一实施例提供的状态寄存器的构成示意图;  4 is a schematic structural diagram of a status register according to another embodiment of the present invention;
图 5是本发明一个实施例提供的共享緩存管理方法的方法流程图; 图 6是本发明另一实施例提供的共享緩存管理方法的方法流程图; 图 7是本发明一个实施例提供的控制器的结构示意图;  5 is a flowchart of a method for a shared cache management method according to an embodiment of the present invention; FIG. 6 is a flowchart of a method for a shared cache management method according to another embodiment of the present invention; FIG. 7 is a flowchart of an embodiment of the present invention. Schematic diagram of the device;
图 8是本发明另一实施例提供的控制器的结构示意图。 具体实施方式  FIG. 8 is a schematic structural diagram of a controller according to another embodiment of the present invention. detailed description
为使本发明的目的、 技术方案和优点更加清楚, 下面将结合附图对本发明 实施方式作进一步地详细描述。  The embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.
请参考图 1 , 其示出了本发明一个实施例提供的緩存器的结构示意图。 该 緩存器可以应用与多核系统中。 该緩存器可以包括: 緩存单元 102、 状态寄存 器 104、 优先级计算单元 106和控制器 108; Please refer to FIG. 1 , which is a schematic structural diagram of a buffer provided by an embodiment of the present invention. This buffer can be applied to multi-core systems. The buffer may include: a cache unit 102, a status register The device 104, the priority calculation unit 106 and the controller 108;
所述緩存单元 102分别与所述状态寄存器 104和所述控制器 108相连接; 所述状态寄存器 104分别与所述緩存单元 102和所述优先级计算单元 106连接; 所述优先级计算单元 106分别与所述状态寄存器 104和所述控制器 108相连接; 所述緩存单元 102包括共享緩存 1022以及 N个阴影标签 1024,所述 N个阴影 标签 1024分别与处理器的 N个内核相对应; N > 2, 且 N为整数;  The buffer unit 102 is respectively connected to the status register 104 and the controller 108; the status register 104 is respectively connected to the buffer unit 102 and the priority calculation unit 106; the priority calculation unit 106 Connected to the status register 104 and the controller 108 respectively; the cache unit 102 includes a shared cache 1022 and N shadow labels 1024, respectively, corresponding to N cores of the processor; N > 2, and N is an integer;
所述状态寄存器 104, 用于记录所述 N个内核各自对所述緩存单元 102的 第一访问信息, 所述第一访问信息包括: 访问所述共享緩存 1022 的次数、 占 用所述共享緩存 1022中的緩存块的数量、访问所述共享緩存 1022并命中的次 数以及访问所述阴影标签 1024并命中的次数;  The status register 104 is configured to record first access information of the N cores to the cache unit 102, where the first access information includes: accessing the shared cache 1022, occupying the shared cache 1022 The number of cache blocks in the cache, the number of hits to the shared cache 1022 and hits, and the number of hits to the shadow tag 1024 and hits;
所述优先级计算单元 106, 用于根据所述状态寄存器 104记录的所述 N个 内核各自对所述緩存单元 102的第一访问信息计算所述 N个内核各自的替换优 先级; 所述替换优先级用于表征对应的内核所占用的緩存块被替换的优先程 度;  The priority calculation unit 106 is configured to calculate, according to the first access information of the cache unit 102, the replacement priorities of the N cores according to the N cores recorded by the status register 104; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel being replaced;
所述控制器 108, 用于在发生访问所述共享緩存未命中并进行緩存块重填 操作时,从所述优先级计算单元中获取所述 N个内核各自的替换优先级, 并从 所述 N 个内核中替换优先级最高的内核当前占用的共享緩存的緩存块中确定 待替换的緩存块。  The controller 108 is configured to acquire, from the priority calculation unit, a replacement priority of each of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and The cache block to be replaced is determined among the N cores in the cache block of the shared cache currently occupied by the core with the highest priority.
在本实施例中, 状态寄存器记录各个内核分别对共享緩存的访问信息, 优 先级计算单元根据各个内核分别对共享緩存的访问信息计算各个内核的替换 优先级, 控制器在进行緩存块重填操作时, 根据各个内核的替换优先级确定待 替换的緩存块, 能够根据各个内核对共享緩存的实际访问情况进行待替换緩存 块的确定, 从而提高共享緩存的利用率。  In this embodiment, the status register records the access information of each core to the shared cache, and the priority calculation unit calculates the replacement priority of each core according to the access information of each core to the shared cache, and the controller performs the cache block refill operation. The cache block to be replaced is determined according to the replacement priority of each core, and the cache block to be replaced can be determined according to the actual access situation of each core to the shared cache, thereby improving the utilization of the shared cache.
综上所述,本发明实施例提供的緩存器,通过状态寄存器记录 N个内核各 自对緩存单元的第一访问信息,优先级计算单元根据状态寄存器记录的第一访 问信息计算 N个内核各自的替换优先级, 控制器根据 N个内核各自的替换优 先级确定共享緩存中待替换的緩存块,解决了现有技术中内核只能从对应的一 部分緩存块中确定待替换的緩存块而导致的共享緩存利用率不高的问题, 达到 提高共享緩存利用率和系统性能的效果。 请参考图 2, 其示出了本发明另一实施例提供的緩存器的结构示意图。 该 緩存器可以应用与多核系统中。 该緩存器可以包括: 緩存单元 202、 状态寄存 器 204、 优先级计算单元 206和控制器 208; In summary, the buffer provided by the embodiment of the present invention records the first access information of each of the N cores to the cache unit through the status register, and the priority calculation unit calculates the respective N cores according to the first access information recorded in the status register. Replacing the priority, the controller determines the cache block to be replaced in the shared cache according to the replacement priority of each of the N cores, which solves the problem that the kernel can only determine the cache block to be replaced from the corresponding part of the cache block in the prior art. The problem of low shared cache utilization is achieved by improving shared cache utilization and system performance. Please refer to FIG. 2, which is a schematic structural diagram of a buffer provided by another embodiment of the present invention. The Buffers can be applied to multi-core systems. The buffer may include: a cache unit 202, a status register 204, a priority calculation unit 206, and a controller 208;
所述緩存单元 202分别与所述状态寄存器 204和所述控制器 208相连接; 所述状态寄存器 204分别与所述緩存单元 202和所述优先级计算单元 206连接; 所述优先级计算单元 206分别与所述状态寄存器 204和所述控制器 208相连接; 所述緩存单元 202包括共享緩存 2022以及 N个阴影标签 2024,所述 N个阴影 标签 2024分别与处理器的 N个内核相对应; N > 2, 且 N为整数;  The buffer unit 202 is respectively connected to the status register 204 and the controller 208; the status register 204 is respectively connected to the buffer unit 202 and the priority calculation unit 206; the priority calculation unit 206 Connected to the status register 204 and the controller 208, respectively; the cache unit 202 includes a shared cache 2022 and N shaded labels 2024, respectively, corresponding to N cores of the processor; N > 2, and N is an integer;
其中, 以处理器包含 4个内核为例, 请参考如图 3所示的緩存单元的构成 示意图, 其中, 緩存单元包括一个共享緩存和 4个阴影标签, 共享緩存中包含 有 n个緩存块, 每个阴影标签也各自包含 n个存储单元, 且共享緩存中的每个 緩存块各自对应每个阴影标签中的一个存储单元。共享緩存中的每个緩存块分 为 4个部分: 当前占用该緩存块的内核 ID ( Identity, 身份标识号码)、 有效标 识、 标签信息和数据; 阴影标签中的每个存储单元包括两个部分: 有效标识和 标签信息。 当处理器中的一个内核访问緩存单元中的共享緩存和阴影标签时, 通过 64位地址中的低位地址确定对应的存储块和各个阴影标签的存储单元的 地址, 并从确定的地址中提取有效标识和标签信息, 若提取到的有效标识为有 效, 且标签信息与 64位地址中的高位地址中包含的信息匹配, 则确定本次访 问命中, 否则, 确定本次访问未命中。  For example, the processor includes four cores. For example, refer to the schematic diagram of the cache unit shown in FIG. 3, where the cache unit includes a shared cache and four shadow labels, and the shared cache includes n cache blocks. Each shadow label also contains n storage units, and each cache block in the shared cache corresponds to one of each of the shadow labels. Each cache block in the shared cache is divided into four parts: the kernel ID (identity, identification number), valid identification, tag information, and data currently occupying the cache block; each storage unit in the shadow tag includes two parts : Valid identification and label information. When a kernel in the processor accesses the shared cache and the shadow tag in the cache unit, the address of the corresponding memory block and the storage unit of each shadow tag is determined by the lower address of the 64-bit address, and the valid address is extracted from the determined address. The identification and label information, if the extracted valid identifier is valid, and the label information matches the information contained in the upper address of the 64-bit address, the current access hit is determined, otherwise, the current access miss is determined.
所述状态寄存器 204, 用于记录所述 N个内核各自对所述緩存单元 202的 第一访问信息, 所述第一访问信息包括: 访问所述共享緩存 2022 的次数、 占 用所述共享緩存 2022中的緩存块的数量、访问所述共享緩存 2022并命中的次 在内核访问共享緩存时,状态寄存器可以通过计数器实时记录内核对共享 緩存和阴影标签的访问命中情况,以图 4所示的状态寄存器的构成示意图为例, 状态寄存器中可以对应每个内核维护 4 个计数器, 分别为访问次数计数器 ( CNT— Acc )、緩存占用计数器(CNT— Size )、緩存命中计数器( Cnt— Shared Hit ) 和标签命中计数器( Cnt— Shadow Hit ), 其中, CNT— Acc对应某一内核访问共 享緩存的次数, CNT— Size 对应某一内核占用共享緩存中的緩存块的数量, Cnt— Shared Hit对应某一内核访问共享緩存并命中的次数, Cnt— Shadow Hit对 应某一内核访问阴影标签并命中的次数。 以内核 i访问共享緩存和阴影标签为 例, 各个计数器的更新情况如下: 当处理器内核 i对共享緩存进行访问操作 (读操作或者写操作) 时, 若命 中共享緩存, 则 Cnt— Acc(i)和 Cnt— Shared Hit(i) 加 1, 若命中阴影标签, 则 Cnt— Shadow Hit(i)加 1; 若未命中共享緩存, 则当存在有空闲的緩存块时, Cnt— Acc(i)与 Cnt— Size(i)加 1 , 若不存在空闲的緩存块, 则 Cnt— Acc(i)计数器加 1, 同时确定替换优先级最高的内核 A, 若内核 A不为内核 i, 则从内核 A占 用的緩存块中确定待替换的緩存块并执行 Write Back (写回) 操作, 且将 Cnt— Size(A)的值减 1, 若内核 A为内核 i, 则 Cnt— Size(A)的值保持不变; 若未 命中阴影标签, 则 Cnt— Shadow Hit(i)保持不变。 The status register 204 is configured to record first access information of the N cores to the cache unit 202, where the first access information includes: accessing the shared cache 2022, occupying the shared cache 2022 The number of cache blocks in the access buffer, access to the shared cache 2022 and the hit time when the kernel accesses the shared cache, the status register can record the kernel's access hits to the shared cache and shadow tags by the counter in real time, in the state shown in FIG. For example, the configuration of the register can maintain four counters for each core, which are the access counter (CNT_Ac), the buffer occupancy counter (CNT_Size), the cache hit counter (Cnt_Shared Hit), and The tag hit counter (Cnt_Shadow Hit), where CNT_Ac corresponds to the number of times a certain kernel accesses the shared cache, CNT_Size corresponds to the number of cache blocks occupied by a certain kernel in the shared cache, and Cnt_Shared Hit corresponds to a certain kernel. The number of times the access cache is hit and hit, Cnt-Shadow Hit corresponds to a kernel access shadow The number of hits and hits. Take the kernel i access shared cache and shadow tags as an example. The update status of each counter is as follows: When the processor core i accesses the shared cache (read operation or write operation), if the shared cache is hit, Cnt_Ac(i) and Cnt_Shared Hit(i) are incremented by 1. If the shadow label is hit, Cnt — Shadow Hit(i) is incremented by one; if the shared cache is missed, Cnt_Ac(i) and Cnt_Size(i) are incremented by 1 when there is a free cache block. If there is no free cache block, then The Cnt_Ac(i) counter is incremented by 1, and the kernel A with the highest priority is replaced. If the kernel A is not the kernel i, the cache block to be replaced is determined from the cache block occupied by the kernel A and Write Back is executed. Operation, and decrement the value of Cnt_Size(A) by 1. If kernel A is kernel i, the value of Cnt_Size(A) remains unchanged; if the shadow label is missing, Cnt- Shadow Hit(i) constant.
所述优先级计算单元 206, 用于根据所述状态寄存器 204记录的所述 N个 内核各自对所述緩存单元 202的第一访问信息计算所述 N个内核各自的替换优 先级; 所述替换优先级用于表征对应的内核所占用的緩存块被替换的优先程 度;  The priority calculation unit 206 is configured to calculate, according to the first access information of the cache unit 202, the replacement priorities of the N cores according to the N cores recorded by the status register 204; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel being replaced;
优先级计算单元 206可以依据上述各个内核的第一访问信息以及替换概率 计算模型计算出每个内核的替换概率, 形成概率分布, 其中替换概率最高的内 核则为替换优先级最高。  The priority calculation unit 206 may calculate the replacement probability of each kernel according to the first access information and the replacement probability calculation model of each of the above-mentioned kernels, and form a probability distribution, wherein the core with the highest replacement probability has the highest replacement priority.
其中, 该替换概率计算模型如下:  Wherein, the replacement probability calculation model is as follows:
假设任一间隔内发生访问未命中的次数为 W。且在这 W次访问未命中内, 内核 i所占的比例为 Mi。 则在该间隔内, 内核 i发生访问未命中的次数为 Mi xW。 若内核 i在这一间隔开始时所占用的緩存块在该间隔内未被替换掉, 则 在间隔结束时, 内核 i的緩存占用比例将变为(Ci+(MiX W/m)), m为緩存块的 总数; Q为内核 i在这一间隔开始时所占用的緩存块数量的比例。  Assume that the number of access misses in any interval is W. And within these W access misses, the proportion of kernel i is Mi. Then, within this interval, the number of access misses in kernel i is Mi xW. If the cache block occupied by kernel i at the beginning of this interval is not replaced within this interval, at the end of the interval, the cache occupancy ratio of kernel i will become (Ci+(MiX W/m)), where m is The total number of cache blocks; Q is the ratio of the number of cache blocks occupied by kernel i at the beginning of this interval.
假设这 W次访问未命中的緩存块替换概率为 , 则有 X W个緩存块在 该间隔内被替换掉。 这相当于内核 i的緩存占用比例又减少了 X W/m, 因此 在该间隔后内核 i的緩存占用比例为 τ i =Οι+((ΜΓΕ W/m)。 由于最小分配单 位是緩存块, 因此与 Q相比, ^的最小改变量为 1/m (这里 Q与 的含义是 百分比)。 Assuming that the cache block replacement probability for this W access miss is, then there are XW cache blocks that are replaced within the interval. This is equivalent to the kernel i's cache occupancy ratio reduced by XW/m, so the kernel i's cache occupancy ratio after this interval is τ i = Οι + ((Μ Γ Ε W/m). Since the minimum allocation unit is the cache block Therefore, compared with Q, the minimum change amount of ^ is 1/m (where Q and the meaning are percentages).
使用 Τ 表示内核 i 达到一个给定的性能目标, 得到的期望目标占用 Τι=τι=Οι+((Μι-Ει) W/m)。 则可推导出替换概率 的表达式如下:  Use Τ to indicate that the kernel i reaches a given performance goal and the expected target is Τι=τι=Οι+((Μι-Ει) W/m). The expression that can be derived from the substitution probability is as follows:
0, if ( (Q-T xm/W + M;) <0  0, if ( (Q-T xm/W + M;) <0
E, 1, if ( (Q-T^xm/W + M;) >1  E, 1, if ( (Q-T^xm/W + M;) >1
(C. -T;)xm/W + M;) , 其它 所述控制器 208, 用于在发生访问所述共享緩存未命中并进行緩存块重填 操作时,从所述优先级计算单元中获取所述 N个内核各自的替换优先级, 并从 所述 N 个内核中替换优先级最高的内核当前占用的共享緩存的緩存块中确定 待替换的緩存块。 (C. -T ; )xm/W + M ; ) , other The controller 208 is configured to acquire, from the priority calculation unit, a replacement priority of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and The cache block to be replaced is determined among the N cores in the cache block of the shared cache currently occupied by the core with the highest priority.
其中,控制器可以根据 LRU(Least Recently Used, 最近最少使用)替换算法 从替换优先级最高的内核当前占用的共享緩存的緩存块中确定待替换的緩存 块。 或者, 控制器也可以根据其它替换算法确定待替换的緩存块, 对此, 本实 施例不作具体限定。  The controller may determine the cache block to be replaced from the cache block of the shared cache currently occupied by the core with the highest priority replacement according to the LRU (Least Recently Used) replacement algorithm. Alternatively, the controller may determine the cache block to be replaced according to other replacement algorithms, which is not specifically limited in this embodiment.
所述控制器 208, 还用于根据性能目标为所述 N个内核分配各自的目标緩 存占用量, 所述性能目标包括整体命中率最大化、 公平性或者服务质量中的至 少一种; 获取替换优先级最高的内核的实际緩存占用量; 检测所述实际緩存占 用量是否大于所述目标緩存占用量; 若检测结果为所述实际緩存占用量不大于 所述目标緩存占用量,则控制所述优先级计算单元重新计算所述 N个内核各自 的替换优先级。  The controller 208 is further configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of overall hit ratio maximization, fairness, or quality of service; The actual cache occupancy of the kernel with the highest priority; detecting whether the actual cache occupancy is greater than the target cache occupancy; if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, then controlling the The priority calculation unit recalculates the respective replacement priorities of the N cores.
为了进一步提高共享緩存的利用率, 需要根据各个内核对共享緩存的访问 情况更新各个内核的替换优先级。 具体的, 控制器可以首先根据性能目标(比 如整体命中率最大化、 公平性或者服务质量等)为各个内核分配目标緩存占用 量, 控制器还从緩存单元获取第一访问信息, 并从第一访问信息中获取替换优 先级最高的内核的实际緩存占用量(替换优先级最高的内核的实际緩存占用量 可以根据替换优先级最高的内核占用所述共享緩存中的緩存块的数量确定), 若替换优先级最高的内核的实际緩存占用量小于等于该替换优先级最高的内 核的目标緩存占用量, 则说明该替换优先级最高的内核占用的共享緩存的緩存 块数量不足, 若继续从该替换优先级最高的内核当前占用的緩存块中确定待替 换的緩存块, 则会出现共享緩存利用率不高的问题, 此时, 控制器可以向优先 级计算单元发送控制指令,以控制该优先级计算单元重新计算 N个内核各自对 应的替换优先级。  In order to further improve the utilization of the shared cache, it is necessary to update the replacement priority of each core according to the access of each kernel to the shared cache. Specifically, the controller may first allocate a target cache occupancy amount to each core according to performance targets (such as overall hit rate maximization, fairness, or quality of service, etc.), and the controller further acquires the first access information from the cache unit, and from the first The actual cache occupancy of the kernel with the highest priority is obtained in the access information (the actual cache occupancy of the kernel with the highest priority substitution can be determined according to the number of cache blocks in the shared cache occupied by the kernel with the highest priority). If the actual cache occupancy of the kernel with the highest priority is less than or equal to the target cache occupancy of the kernel with the highest priority, the number of cache buffers of the shared cache occupied by the kernel with the highest replacement priority is insufficient. If the cache block to be replaced is determined by the cache block currently occupied by the highest priority kernel, the problem that the shared cache utilization is not high may occur. At this time, the controller may send a control instruction to the priority calculation unit to control the priority. The calculation unit recalculates the corresponding replacement of the N cores priority.
所述状态寄存器 204还与所述控制器 208相连; 所述状态寄存器 204, 还 用于记录所述共享緩存 2022 中的各个緩存块各自对应的第二访问信息, 所述 第二访问信息包括被所述 N个内核分别占用的次数;  The status register 204 is further connected to the controller 208. The status register 204 is further configured to record second access information corresponding to each cache block in the shared cache 2022, where the second access information includes The number of times the N cores are occupied respectively;
所述控制器 208, 用于在发生访问所述共享緩存未命中并进行緩存块重填 操作时,从所述状态寄存器 204获取替换优先级最高的内核当前占用的各个緩 存块各自对应的所述第二访问信息 , 并根据所述替换优先级最高的内核当前占 用的各个緩存块各自对应的所述第二访问信息确定所述待替换的緩存块。 The controller 208 is configured to: when the accessing the shared cache miss occurs and perform a cache block refill operation, obtain, from the status register 204, each of the currently occupied cores with the highest priority And storing the second access information corresponding to each of the cache blocks, and determining the cache block to be replaced according to the second access information corresponding to each of the cache blocks currently occupied by the kernel with the highest replacement priority.
其中, 所述控制器 208, 具体用于从所述替换优先级最高的内核当前占用 的緩存块中确定第一类型緩存块, 所述第一类型緩存块为被所述替换优先级最 高的内核占用的次数最少的緩存块; 从所述第一类型緩存块中确定第二类型緩 存块,所述第二类型緩存块为被所述 N个内核占用的总次数最少的緩存块; 根 据替换算法从所述第二类型緩存块中确定所述待替换的緩存块。  The controller 208 is specifically configured to determine, according to a buffer block currently occupied by the core with the highest replacement priority, the first type cache block, where the first type cache block is the core with the highest replacement priority. a cache block having the least number of occupations; determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; Determining the cache block to be replaced from the second type of cache block.
为了进一步提高共享緩存中的各个緩存块的利用率, 平均使用各个緩存 块, 避免緩存块的重用局部性, 本实施例还结合各个緩存块被占用的次数来确 定待替换的緩存块。 具体的, 状态寄存器还将共享緩存中的各个緩存块被各个 内核分别占用的次数记录为各个緩存块对应的第二访问信息, 在后续内核访问 共享緩存未命中并进行緩存块重填时, 首先从替换优先级最高的内核当前占用 的緩存块中确定被该替换优先级最高的内核占用次数最少的緩存块,再从该替 换优先级最高的内核占用次数最少的緩存块中确定被所有内核占用的总次数 最少的緩存块, 最后根据 LUR等替换算法从被所有内核占用的总次数最少的 緩存块中确定待替换的緩存块。上述方法能够倾向于选择出被重用次数最少的 緩存块, 从而兼顾了緩存块的重用局部性和緩存块被所有内核共享的程度。  In order to further improve the utilization of each cache block in the shared cache, each cache block is used on average to avoid the reused locality of the cache block. This embodiment also determines the cache block to be replaced in combination with the number of times each cache block is occupied. Specifically, the status register records the number of times each buffer block in the shared cache is occupied by each core as the second access information corresponding to each cache block, and when the subsequent kernel accesses the shared cache miss and performs the cache block refill, first Determining, from the cache block currently occupied by the kernel with the highest priority, the cache block with the least number of cores with the highest replacement priority, and determining that it is occupied by all cores from the cache block with the least number of cores with the highest replacement priority. The cache block with the least total number of times, and finally the cache block to be replaced is determined from the cache block with the least total number of times occupied by all the kernels according to a replacement algorithm such as LUR. The above method can tend to select the cache block with the least number of reuses, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores.
综上所述,本发明实施例提供的緩存器,通过状态寄存器记录 N个内核各 自对緩存单元的第一访问信息,优先级计算单元根据状态寄存器记录的第一访 问信息计算 N个内核各自的替换优先级, 控制器根据 N个内核各自的替换优 先级确定共享緩存中待替换的緩存块,解决了现有技术中内核只能从对应的一 部分緩存块中确定待替换的緩存块而导致的共享緩存利用率不高的问题, 达到 提高共享緩存利用率和系统性能的效果。  In summary, the buffer provided by the embodiment of the present invention records the first access information of each of the N cores to the cache unit through the status register, and the priority calculation unit calculates the respective N cores according to the first access information recorded in the status register. Replacing the priority, the controller determines the cache block to be replaced in the shared cache according to the replacement priority of each of the N cores, which solves the problem that the kernel can only determine the cache block to be replaced from the corresponding part of the cache block in the prior art. The problem of low shared cache utilization is achieved by improving shared cache utilization and system performance.
此外, 本发明实施例提供的緩存器, 通过检测替换优先级最高的内核的实 际緩存占用量与替换优先级最高的内核的目标緩存占用量之间的大小关系, 当 实际緩存占用量不大于目标緩存占用量时,控制优先级计算单元重新计算各个 内核的替换优先级, 从而进一步提高共享緩存的利用率。  In addition, the buffer provided by the embodiment of the present invention detects the size relationship between the actual cache occupancy of the kernel with the highest priority and the target cache occupancy of the kernel with the highest priority, and the actual cache occupancy is not greater than the target. When the cache occupancy is exceeded, the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.
最后, 本发明实施例提供的緩存器, 通过从替换优先级最高的内核当前占 用的緩存块中确定第一类型緩存块, 第一类型緩存块为被替换优先级最高的内 核占用的次数最少的緩存块; 从第一类型緩存块中确定第二类型緩存块, 第二 类型緩存块为被 N个内核占用的总次数最少的緩存块;根据替换算法从第二类 型緩存块中确定待替换的緩存块 ,从而兼顾了緩存块的重用局部性和緩存块被 所有内核共享的程度, 进一步提高共享緩存中的各个緩存块的利用率。 请参考图 5, 其示出了本发明一个实施例提供的共享緩存管理方法的方法 流程图。 该方法可以在如图 1或图 2所示的緩存器中对内核的访问进行控制。 该共享緩存管理方法可以包括: Finally, the buffer provided by the embodiment of the present invention determines the first type of cache block by using the cache block currently occupied by the kernel with the highest replacement priority, and the first type cache block is the least occupied by the kernel with the highest priority. a cache block; determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; The cache block to be replaced is determined in the type cache block, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, thereby further improving the utilization of each cache block in the shared cache. Please refer to FIG. 5, which is a flowchart of a method for sharing a cache management method according to an embodiment of the present invention. The method can control access to the kernel in a buffer as shown in FIG. 1 or 2. The shared cache management method can include:
步骤 302, 在发生访问该共享緩存未命中并进行緩存块重填操作时, 从优 先级计算单元中获取处理器的 N个内核各自的替换优先级;该替换优先级用于 表征对应的内核所占用的緩存块被替换的优先程度;  Step 302: Acquire a replacement priority of each of the N cores of the processor from the priority calculation unit when the access to the shared cache miss occurs and perform a cache block refill operation; the replacement priority is used to represent the corresponding kernel The priority of the occupied cache block being replaced;
步骤 304 , 从该 N个内核中替换优先级最高的内核当前占用的共享緩存的 緩存块中确定待替换的緩存块。  Step 304: Determine, from the N cores, a cache block to be replaced from a cache block of a shared cache currently occupied by a kernel with the highest priority.
综上所述, 本发明实施例提供的共享緩存管理方法, 通过从优先级计算单 元获取 N个内核各自的替换优先级, 根据 N个内核各自的替换优先级确定共 享緩存中待替换的緩存块, 解决了现有技术中内核只能从对应的一部分緩存块 中确定待替换的緩存块而导致的共享緩存利用率不高的问题, 达到提高共享緩 存利用率和系统性能的效果。 请参考图 6, 其示出了本发明另一实施例提供的共享緩存管理方法的方法 流程图。 该方法可以在如图 1或图 2所示的緩存器中对内核的访问进行控制。 该共享緩存管理方法可以包括:  In summary, the shared cache management method provided by the embodiment of the present invention determines the replacement priority of the N cores from the priority calculation unit, and determines the cache block to be replaced in the shared cache according to the replacement priorities of the N cores. In the prior art, the problem that the kernel cache can only determine the cache cache to be replaced from the corresponding part of the cache block is not high, and the effect of improving the shared cache utilization and system performance is achieved. Please refer to FIG. 6, which is a flowchart of a method for sharing a cache management method according to another embodiment of the present invention. This method can control access to the kernel in a buffer as shown in Figure 1 or Figure 2. The shared cache management method can include:
步骤 402, 在发生访问该共享緩存未命中并进行緩存块重填操作时, 从优 先级计算单元中获取处理器的 N个内核各自的替换优先级;  Step 402: Obtain a replacement priority of each of the N cores of the processor from the priority computing unit when accessing the shared cache miss occurs and performing a cache block refill operation;
其中, 该替换优先级用于表征对应的内核所占用的緩存块被替换的优先程 度。  The replacement priority is used to indicate the priority of the cache block occupied by the corresponding kernel to be replaced.
其中, 该替换优先级为优先级计算单元根据状态寄存器记录的对应于各个 内核的第一访问信息计算获得, 该状态寄存器记录第一访问信息的具体过程以 及优先级计算单元计算各个内核的优先级的过程请参考图 2对应的实施例中的 描述, 此处不再赘述。  The replacement priority is calculated by the priority calculation unit according to the first access information corresponding to each core recorded in the status register, the specific process of recording the first access information by the status register, and the priority calculation unit calculating the priority of each core. For the process of the process, please refer to the description in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤 404, 从状态寄存器获取替换优先级最高的内核当前占用的各个緩存 块各自对应的该第二访问信息;  Step 404: Obtain, from the status register, the second access information corresponding to each cache block currently occupied by the core with the highest priority;
其中, 状态寄存器还用于记录共享緩存中的各个緩存块各自对应的第二访 问信息, 该第二访问信息包括被 N个内核分别占用的次数。 The status register is also used to record the second visit corresponding to each cache block in the shared cache. The information is information, and the second access information includes the number of times occupied by the N cores.
步骤 406, 根据该替换优先级最高的内核当前占用的各个緩存块各自对应 的该第二访问信息确定待替换的緩存块。  Step 406: Determine, according to the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority, the cache block to be replaced.
控制器可以从该替换优先级最高的内核当前占用的緩存块中确定第一类 型緩存块, 该第一类型緩存块为被该替换优先级最高的内核占用的次数最少的 緩存块; 从该第一类型緩存块中确定第二类型緩存块, 该第二类型緩存块为被 该 N个内核占用的总次数最少的緩存块;根据替换算法从该第二类型緩存块中 确定待替换的緩存块。  The controller may determine, from a cache block currently occupied by the core with the highest replacement priority, the first type cache block being the cache block with the least number of times occupied by the kernel with the highest replacement priority; Determining a second type of cache block in a type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; determining a cache block to be replaced from the second type of cache block according to a replacement algorithm .
为了进一步提高共享緩存中的各个緩存块的利用率, 平均使用各个緩存 块, 避免緩存块的重用局部性, 本实施例还结合各个緩存块被占用的次数来确 定待替换的緩存块。 具体的, 状态寄存器还将共享緩存中的各个緩存块被各个 内核占用的次数记录为各个緩存块对应的第二访问信息,在后续内核访问共享 緩存未命中并进行緩存块重填时, 首先从替换优先级最高的内核当前占用的緩 存块中确定被该替换优先级最高的内核占用次数最少的緩存块,再从该替换优 先级最高的内核占用次数最少的緩存块中确定被所有内核占用的总次数最少 的緩存块, 最后根据 LUR等替换算法从被所有内核占用的总次数最少的緩存 块中确定待替换的緩存块。  In order to further improve the utilization of each cache block in the shared cache, each cache block is used on average to avoid the reused locality of the cache block. This embodiment also determines the cache block to be replaced in combination with the number of times each cache block is occupied. Specifically, the status register records the number of times each cache block in the shared cache is occupied by each core as the second access information corresponding to each cache block. When the subsequent kernel accesses the shared cache miss and performs the cache block refill, the first The cache block currently occupied by the highest priority kernel is determined to be the cache block with the least number of cores occupied by the highest priority, and the cache block that has the least number of kernels with the highest priority is determined to be occupied by all the cores. The cache block with the least total number of times, and finally the cache block to be replaced is determined from the cache block with the least total number of times occupied by all the kernels according to a replacement algorithm such as LUR.
此外, 控制器还根据性能目标为该 N个内核分配各自的目标緩存占用量, 该性能目标包括整体命中率最大化、 公平性或者服务质量中的至少一种; 获取 替换优先级最高的内核的实际緩存占用量;检测该实际緩存占用量是否不大于 该目标緩存占用量; 若检测结果为该实际緩存占用量不大于该目标緩存占用 量, 则控制该优先级计算单元重新计算该 N个内核各自的替换优先级。  In addition, the controller allocates a respective target cache occupancy for the N cores according to performance goals, the performance target including at least one of overall hit rate maximization, fairness, or quality of service; obtaining a kernel with the highest replacement priority The actual cache occupancy is detected; if the actual cache occupancy is not greater than the target cache occupancy; if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, then the priority calculation unit is controlled to recalculate the N cores. Respective replacement priorities.
为了进一步提高共享緩存的利用率, 需要根据各个内核对共享緩存的访问 情况更新各个内核的替换优先级。 具体的, 控制器可以首先根据性能目标(比 如整体命中率最大化、 公平性或者服务质量等)为各个内核分配目标緩存占用 量, 控制器还从緩存单元获取第一访问信息, 并从第一访问信息中获取替换优 先级最高的内核的实际緩存占用量(替换优先级最高的内核的实际緩存占用量 可以根据替换优先级最高的内核占用所述共享緩存中的緩存块的数量确定), 若替换优先级最高的内核的实际緩存占用量小于等于该替换优先级最高的内 核的目标緩存占用量, 则说明该替换优先级最高的内核占用的共享緩存的緩存 块数量不足, 若继续从该替换优先级最高的内核当前占用的緩存块中确定待替 换的緩存块, 则会出现共享緩存利用率不高的问题, 此时, 控制器可以向优先 级计算单元发送控制指令,以控制该优先级计算单元重新计算 N个内核各自对 应的替换优先级。 In order to further improve the utilization of the shared cache, it is necessary to update the replacement priority of each core according to the access of each kernel to the shared cache. Specifically, the controller may first allocate a target cache occupancy amount to each core according to performance targets (such as overall hit rate maximization, fairness, or quality of service, etc.), and the controller further acquires the first access information from the cache unit, and from the first The actual cache occupancy of the kernel with the highest priority is obtained in the access information (the actual cache occupancy of the kernel with the highest priority substitution can be determined according to the number of cache blocks in the shared cache occupied by the kernel with the highest priority). If the actual cache occupancy of the kernel with the highest priority is less than or equal to the target cache occupancy of the kernel with the highest priority, the number of cache buffers of the shared cache occupied by the kernel with the highest replacement priority is insufficient. The cache block currently occupied by the highest priority kernel is determined to be replaced The swap cache block may have a problem that the shared cache utilization is not high. At this time, the controller may send a control instruction to the priority calculation unit to control the priority calculation unit to recalculate the replacement priority corresponding to each of the N cores. .
综上所述, 本发明实施例提供的共享緩存管理方法, 通过从优先级计算单 元获取 N个内核各自的替换优先级, 根据 N个内核各自的替换优先级确定共 享緩存中待替换的緩存块, 解决了现有技术中内核只能从对应的一部分緩存块 中确定待替换的緩存块而导致的共享緩存利用率不高的问题, 达到提高共享緩 存利用率和系统性能的效果。  In summary, the shared cache management method provided by the embodiment of the present invention determines the replacement priority of the N cores from the priority calculation unit, and determines the cache block to be replaced in the shared cache according to the replacement priorities of the N cores. In the prior art, the problem that the kernel cache can only determine the cache cache to be replaced from the corresponding part of the cache block is not high, and the effect of improving the shared cache utilization and system performance is achieved.
此外, 本发明实施例提供的共享緩存管理方法, 通过检测替换优先级最高 的内核的实际緩存占用量与替换优先级最高的内核的目标緩存占用量之间的 大小关系, 当实际緩存占用量不大于目标緩存占用量时, 控制优先级计算单元 重新计算各个内核的替换优先级, 从而进一步提高共享緩存的利用率。  In addition, the shared cache management method provided by the embodiment of the present invention detects the size relationship between the actual cache occupancy of the kernel with the highest priority and the target cache occupancy of the kernel with the highest priority, when the actual cache occupancy is not When the target cache occupancy is greater than, the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.
最后, 本发明实施例提供的共享緩存管理方法, 通过从替换优先级最高的 内核当前占用的緩存块中确定第一类型緩存块, 第一类型緩存块为被替换优先 级最高的内核占用的次数最少的緩存块; 从第一类型緩存块中确定第二类型緩 存块, 第二类型緩存块为被 N个内核占用的总次数最少的緩存块; 根据替换算 法从第二类型緩存块中确定待替换的緩存块,从而兼顾了緩存块的重用局部性 和緩存块被所有内核共享的程度, 进一步提高共享緩存中的各个緩存块的利用 率。 请参考图 7, 其示出了本发明一个实施例提供的控制器的结构示意图。 该 控制器可以在如图 1或图 2所示的緩存器中对内核的访问进行控制。 该控制器 可以包括:  Finally, the shared cache management method provided by the embodiment of the present invention determines the first type of cache block by using the cache block currently occupied by the kernel with the highest priority, and the first type of cache block is the number of times the core with the highest priority is replaced. a minimum number of cache blocks; determining a second type of cache block from the first type of cache block, the second type of cache block being the least number of cache blocks occupied by the N cores; determining from the second type of cache block according to the replacement algorithm The replacement cache block takes into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, further improving the utilization of each cache block in the shared cache. Please refer to FIG. 7, which is a schematic structural diagram of a controller according to an embodiment of the present invention. The controller can control access to the kernel in a buffer as shown in Figure 1 or Figure 2. The controller can include:
第一获取模块 501 , 用于在发生访问所述共享緩存未命中并进行緩存块重 填操作时, 从所述优先级计算单元中获取处理器的 N 个内核各自的替换优先 级; 所述替换优先级用于表征对应的内核所占用的緩存块被替换的优先程度; 确定模块 502 , 用于从所述 N个内核中替换优先级最高的内核当前占用的 共享緩存的緩存块中确定待替换的緩存块。  a first obtaining module 501, configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel is replaced; the determining module 502 is configured to determine, from the N cores, the cache block of the shared cache currently occupied by the highest priority kernel to be replaced. Cache block.
综上所述,本发明实施例提供的控制器,通过从优先级计算单元获取 N个 内核各自的替换优先级,根据 N个内核各自的替换优先级确定共享緩存中待替 换的緩存块, 解决了现有技术中内核只能从对应的一部分緩存块中确定待替换 的緩存块而导致的共享緩存利用率不高的问题, 达到提高共享緩存利用率和系 统性能的效果。 请参考图 8, 其示出了本发明另一实施例提供的控制器的结构示意图。 该 控制器可以在如图 1或图 2所示的緩存器中对内核的访问进行控制。 该控制器 可以包括: In summary, the controller provided by the embodiment of the present invention obtains the replacement priorities of the N cores from the priority calculation unit, and determines the cache blocks to be replaced in the shared cache according to the replacement priorities of the N cores. In the prior art, the kernel can only determine that it is to be replaced from a corresponding part of the cache block. The problem of low utilization of the shared cache caused by the cache block achieves the effect of improving the shared cache utilization and system performance. Please refer to FIG. 8 , which is a schematic structural diagram of a controller according to another embodiment of the present invention. The controller can control access to the kernel in a buffer as shown in FIG. 1 or 2. The controller can include:
第一获取模块 601 , 用于在发生访问所述共享緩存未命中并进行緩存块重 填操作时, 从所述优先级计算单元中获取处理器的 N 个内核各自的替换优先 级; 所述替换优先级用于表征对应的内核所占用的緩存块被替换的优先程度; 确定模块 602 , 用于从所述 N个内核中替换优先级最高的内核当前占用的 共享緩存的緩存块中确定待替换的緩存块。  a first obtaining module 601, configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel is replaced; the determining module 602 is configured to determine, from the N cores, the cache block of the shared cache currently occupied by the highest priority kernel to be replaced. Cache block.
所述控制器包括:  The controller includes:
分配模块 603 , 用于根据性能目标为所述 N个内核分配各自的目标緩存占 用量, 所述性能目标包括整体命中率最大化、 公平性或者服务质量中的至少一 种;  The allocating module 603 is configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of an overall hit rate maximization, fairness, or quality of service;
第二获取模块 604, 用于获取替换优先级最高的内核的实际緩存占用量; 检测模块 605 , 用于检测所述实际緩存占用量是否不大于所述目标緩存占 用量;  The second obtaining module 604 is configured to obtain an actual cache occupancy of the kernel with the highest priority; the detecting module 605 is configured to detect whether the actual cache occupancy is not greater than the target cache usage;
控制模块 606, 用于若检测结果为所述实际緩存占用量不大于所述目标緩 存占用量,则控制所述优先级计算单元重新计算所述 N个内核各自的替换优先 级。  The control module 606 is configured to, if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, control the priority calculation unit to recalculate respective replacement priorities of the N cores.
所述确定模块 602, 包括:  The determining module 602 includes:
获取单元 6021 ,用于从所述状态寄存器获取替换优先级最高的内核当前占 用的各个緩存块各自对应的第二访问信息;  The obtaining unit 6021 is configured to obtain, from the status register, second access information corresponding to each of the cache blocks currently occupied by the kernel with the highest priority;
确定单元 6022 ,用于根据所述替换优先级最高的内核当前占用的各个緩存 块各自对应的所述第二访问信息确定所述待替换的緩存块;  The determining unit 6022 is configured to determine, according to the second access information corresponding to each cache block currently occupied by the kernel with the highest replacement priority, the cache block to be replaced;
其中, 所述状态寄存器还用于记录所述共享緩存中的各个緩存块各自对应 的所述第二访问信息, 所述第二访问信息包括被所述 N 个内核分别占用的次 数。  The status register is further configured to record the second access information corresponding to each cache block in the shared cache, where the second access information includes a number of times occupied by the N cores.
所述确定单元 6022, 包括:  The determining unit 6022 includes:
第一确定子单元 6022a, 用于从所述替换优先级最高的内核当前占用的緩 存块中确定第一类型緩存块, 所述第一类型緩存块为被所述替换优先级最高的 内核占用的次数最少的緩存块; a first determining sub-unit 6022a, configured to use the current highest priority from the core with the highest replacement priority Determining a first type of cache block in the storage block, the first type of cache block being a cache block having the least number of times occupied by the core with the highest replacement priority;
第二确定子单元 6022b, 用于从所述第一类型緩存块中确定第二类型緩存 块, 所述第二类型緩存块为被所述 N个内核占用的总次数最少的緩存块; 第三确定子单元 6022c, 用于根据替换算法从所述第二类型緩存块中确定 所述待替换的緩存块。  a second determining sub-unit 6022b, configured to determine, from the first type of cache block, a second type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores; The determining subunit 6022c is configured to determine, according to the replacement algorithm, the cache block to be replaced from the second type cache block.
综上所述,本发明实施例提供的控制器,通过从优先级计算单元获取 N个 内核各自的替换优先级,根据 N个内核各自的替换优先级确定共享緩存中待替 换的緩存块, 解决了现有技术中内核只能从对应的一部分緩存块中确定待替换 的緩存块而导致的共享緩存利用率不高的问题, 达到提高共享緩存利用率和系 统性能的效果。  In summary, the controller provided by the embodiment of the present invention obtains the replacement priorities of the N cores from the priority calculation unit, and determines the cache blocks to be replaced in the shared cache according to the replacement priorities of the N cores. In the prior art, the kernel can only determine the cache cache to be replaced from the corresponding part of the cache block, and the utilization of the shared cache is not high, thereby achieving the effect of improving the shared cache utilization and system performance.
此外, 本发明实施例提供的控制器, 通过检测替换优先级最高的内核的实 际緩存占用量与替换优先级最高的内核的目标緩存占用量之间的大小关系, 当 实际緩存占用量不大于目标緩存占用量时,控制优先级计算单元重新计算各个 内核的替换优先级, 从而进一步提高共享緩存的利用率。  In addition, the controller provided by the embodiment of the present invention detects the size relationship between the actual cache occupancy of the core with the highest priority and the target cache occupancy of the core with the highest priority, and the actual cache occupancy is not greater than the target. When the cache occupancy is exceeded, the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.
最后, 本发明实施例提供的控制器, 通过从替换优先级最高的内核当前占 用的緩存块中确定第一类型緩存块, 第一类型緩存块为被替换优先级最高的内 核占用的次数最少的緩存块; 从第一类型緩存块中确定第二类型緩存块, 第二 类型緩存块为被 N个内核占用的总次数最少的緩存块;根据替换算法从第二类 型緩存块中确定待替换的緩存块,从而兼顾了緩存块的重用局部性和緩存块被 所有内核共享的程度, 进一步提高共享緩存中的各个緩存块的利用率。 本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通 过硬件来完成, 也可以通过程序来指令相关的硬件完成, 所述的程序可以存储 于一种计算机可读存储介质中, 上述提到的存储介质可以是只读存储器, 磁盘 或光盘等。 以上所述仅为本发明的较佳实施例, 并不用以限制本发明, 凡在本发明的 精神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的 保护范围之内。  Finally, the controller provided by the embodiment of the present invention determines the first type of cache block by using the cache block currently occupied by the core with the highest priority, and the first type of cache block is the least occupied by the core with the highest priority. a cache block; determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; determining, to be replaced, from the second type of cache block according to a replacement algorithm Cache blocks, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, further improving the utilization of each cache block in the shared cache. A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like. The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims

权 利 要 求 书 claims
1、 一种緩存器, 其特征在于, 所述緩存器包括: 緩存单元、 状态寄存器、 优先级计算单元和控制器; 1. A cache, characterized in that the cache includes: a cache unit, a status register, a priority calculation unit and a controller;
所述緩存单元分别与所述状态寄存器和所述控制器相连接; 所述状态寄存 器分别与所述緩存单元和所述优先级计算单元连接; 所述优先级计算单元分别 与所述状态寄存器和所述控制器相连接; 所述緩存单元包括共享緩存以及 N个 阴影标签, 所述 N个阴影标签分别与处理器的 N个内核相对应; N > 2, 且 N为 整数; The cache unit is connected to the status register and the controller respectively; the status register is connected to the cache unit and the priority calculation unit respectively; the priority calculation unit is connected to the status register and the priority calculation unit respectively. The controller is connected; the cache unit includes a shared cache and N shadow tags, the N shadow tags respectively correspond to N cores of the processor; N > 2, and N is an integer;
所述状态寄存器, 用于记录所述 N个内核各自对所述緩存单元的第一访问 信息, 所述第一访问信息包括: 访问所述共享緩存的次数、 占用所述共享緩存 中的緩存块的数量、 访问所述共享緩存并命中的次数以及访问所述阴影标签并 命中的次数; The status register is used to record the first access information of each of the N cores to the cache unit. The first access information includes: the number of times the shared cache is accessed, and the cache blocks occupied in the shared cache. The number of times, the number of times the shared cache is accessed and hit, and the number of times the shadow tag is accessed and hit;
所述优先级计算单元, 用于根据所述状态寄存器记录的所述 N个内核各自 对所述緩存单元的第一访问信息计算所述 N个内核各自的替换优先级; 所述替 换优先级用于表征对应的内核所占用的緩存块被替换的优先程度; The priority calculation unit is configured to calculate the replacement priority of each of the N cores according to the first access information of each of the N cores to the cache unit recorded in the status register; the replacement priority is used It represents the priority of replacing the cache block occupied by the corresponding core;
所述控制器, 用于在发生访问所述共享緩存未命中并进行緩存块重填操作 时, 从所述优先级计算单元中获取所述 N个内核各自的替换优先级, 并从所述 N 个内核中替换优先级最高的内核当前占用的共享緩存的緩存块中确定待替换 的緩存块。 The controller is configured to obtain the replacement priorities of each of the N cores from the priority calculation unit when a miss occurs in accessing the shared cache and performs a cache block refill operation, and obtains the replacement priorities from the N cores. The cache block to be replaced is determined from the cache block of the shared cache currently occupied by the core with the highest replacement priority among the cores.
2、 根据权利要求 1所述的緩存器, 其特征在于, 所述控制器, 还用于 根据性能目标为所述 N个内核分配各自的目标緩存占用量, 所述性能目标 包括整体命中率最大化、 公平性或者服务质量中的至少一种; 2. The cache according to claim 1, wherein the controller is further configured to allocate respective target cache occupancies to the N cores according to performance goals, the performance goals include maximum overall hit rate at least one of customization, fairness or service quality;
获取替换优先级最高的内核的实际緩存占用量; Get the actual cache occupancy of the core with the highest replacement priority;
检测所述实际緩存占用量是否不大于所述目标緩存占用量; Detect whether the actual cache occupancy is not greater than the target cache occupancy;
若检测结果为所述实际緩存占用量不大于所述目标緩存占用量, 则控制所 述优先级计算单元重新计算所述 N个内核各自的替换优先级。 If the detection result is that the actual cache occupancy is not greater than the target cache occupancy, the priority calculation unit is controlled to recalculate the replacement priorities of each of the N cores.
3、 根据权利要求 1所述緩存器, 其特征在于, 所述状态寄存器还与所述控 制器相连; 所述状态寄存器, 还用于记录所述共享緩存中的各个緩存块各自对应的第 二访问信息, 所述第二访问信息包括被所述 N个内核分别占用的次数; 3. The buffer according to claim 1, characterized in that, the status register is also connected to the controller; The status register is also used to record the second access information corresponding to each cache block in the shared cache, where the second access information includes the number of times occupied by the N cores;
所述控制器, 用于在发生访问所述共享緩存未命中并进行緩存块重填操作 时, 从所述状态寄存器获取替换优先级最高的内核当前占用的各个緩存块各自 对应的所述第二访问信息, 并根据所述替换优先级最高的内核当前占用的各个 緩存块各自对应的所述第二访问信息确定所述待替换的緩存块。 The controller is configured to obtain, from the status register, the second corresponding cache blocks currently occupied by the core with the highest replacement priority when a miss occurs when accessing the shared cache and a cache block refill operation is performed. access information, and determine the cache block to be replaced based on the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority.
4、 根据权利要求 3所述緩存器, 其特征在于, 所述控制器, 用于 从所述替换优先级最高的内核当前占用的緩存块中确定第一类型緩存块, 所述第一类型緩存块为被所述替换优先级最高的内核占用的次数最少的緩存 块; 4. The cache according to claim 3, characterized in that, the controller is configured to determine the first type cache block from the cache block currently occupied by the core with the highest replacement priority, the first type cache The block is the cache block occupied the least number of times by the core with the highest replacement priority;
从所述第一类型緩存块中确定第二类型緩存块, 所述第二类型緩存块为被 所述 N个内核占用的总次数最少的緩存块; Determine a second type of cache block from the first type of cache block, where the second type of cache block is the cache block that has been occupied by the N cores the least number of times;
根据替换算法从所述第二类型緩存块中确定所述待替换的緩存块。 The cache block to be replaced is determined from the cache blocks of the second type according to a replacement algorithm.
5、 一种共享緩存管理方法, 用于如上述权利要求 1至 4任一所述的緩存器 中, 其特征在于, 所述方法包括: 5. A shared cache management method, used in the cache according to any one of the above claims 1 to 4, characterized in that the method includes:
在发生访问所述共享緩存未命中并进行緩存块重填操作时, 从所述优先级 计算单元中获取处理器的 N个内核各自的替换优先级; 所述替换优先级用于表 征对应的内核所占用的緩存块被替换的优先程度; When a miss occurs when accessing the shared cache and a cache block refill operation is performed, the replacement priorities of each of the N cores of the processor are obtained from the priority calculation unit; the replacement priorities are used to characterize the corresponding cores. The priority with which occupied cache blocks are replaced;
从所述 N个内核中替换优先级最高的内核当前占用的共享緩存的緩存块中 确定待替换的緩存块。 The cache block to be replaced is determined from the cache blocks of the shared cache currently occupied by the core with the highest priority among the N cores.
6、 根据权利要求 5所述的方法, 其特征在于, 所述方法包括: 6. The method according to claim 5, characterized in that, the method includes:
根据性能目标为所述 N个内核分配各自的目标緩存占用量, 所述性能目标 包括整体命中率最大化、 公平性或者服务质量中的至少一种; Allocate respective target cache occupancies to the N cores according to performance goals, where the performance goals include at least one of maximization of overall hit rate, fairness or quality of service;
获取替换优先级最高的内核的实际緩存占用量; Get the actual cache occupancy of the core with the highest replacement priority;
检测所述实际緩存占用量是否不大于所述目标緩存占用量; Detect whether the actual cache occupancy is not greater than the target cache occupancy;
若检测结果为所述实际緩存占用量不大于所述目标緩存占用量, 则控制所 述优先级计算单元重新计算所述 N个内核各自的替换优先级。 If the detection result is that the actual cache occupancy is not greater than the target cache occupancy, the priority calculation unit is controlled to recalculate the replacement priorities of each of the N cores.
7、 根据权利要求 5所述的方法, 其特征在于, 所述状态寄存器还用于记录 所述共享緩存中的各个緩存块各自对应的第二访问信息, 所述第二访问信息包 括被所述 N个内核分别占用的次数; 所述从所述 N个内核中替换优先级最高的 内核当前占用的共享緩存的緩存块中确定待替换的緩存块, 包括: 7. The method according to claim 5, characterized in that, the status register is also used to record the second access information corresponding to each cache block in the shared cache, the second access information includes the The number of times N cores are respectively occupied; Determining the cache block to be replaced from the cache blocks of the shared cache currently occupied by the core with the highest replacement priority among the N cores includes:
从所述状态寄存器获取替换优先级最高的内核当前占用的各个緩存块各自 对应的所述第二访问信息; Obtain the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority from the status register;
根据所述替换优先级最高的内核当前占用的各个緩存块各自对应的所述第 二访问信息确定所述待替换的緩存块。 The cache block to be replaced is determined according to the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority.
8、 根据权利要求 7所述的方法, 其特征在于, 所述根据所述替换优先级最 高的内核当前占用的各个緩存块各自对应的所述第二访问信息确定所述待替换 的緩存块, 包括: 8. The method according to claim 7, wherein the cache block to be replaced is determined based on the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority, include:
从所述替换优先级最高的内核当前占用的緩存块中确定第一类型緩存块, 所述第一类型緩存块为被所述替换优先级最高的内核占用的次数最少的緩存 块; Determine a first type cache block from the cache blocks currently occupied by the core with the highest replacement priority, where the first type cache block is the cache block occupied the least number of times by the core with the highest replacement priority;
从所述第一类型緩存块中确定第二类型緩存块, 所述第二类型緩存块为被 所述 N个内核占用的总次数最少的緩存块; Determine a second type of cache block from the first type of cache block, where the second type of cache block is the cache block that has been occupied by the N cores the least number of times;
根据替换算法从所述第二类型緩存块中确定所述待替换的緩存块。 The cache block to be replaced is determined from the cache blocks of the second type according to a replacement algorithm.
9、 一种控制器, 用于如上述权利要求 1至 4任一所述的緩存器中, 其特征 在于, 所述控制器包括: 9. A controller used in the buffer according to any one of the above claims 1 to 4, characterized in that the controller includes:
第一获取模块, 用于在发生访问所述共享緩存未命中并进行緩存块重填操 作时, 从所述优先级计算单元中获取处理器的 N个内核各自的替换优先级; 所 述替换优先级用于表征对应的内核所占用的緩存块被替换的优先程度; The first acquisition module is configured to acquire the replacement priorities of each of the N cores of the processor from the priority calculation unit when a miss occurs when accessing the shared cache and a cache block refill operation is performed; the replacement priority is The level is used to represent the priority of replacement of the cache block occupied by the corresponding core;
确定模块, 用于从所述 N个内核中替换优先级最高的内核当前占用的共享 緩存的緩存块中确定待替换的緩存块。 The determining module is configured to determine the cache block to be replaced from the cache block of the shared cache currently occupied by the core with the highest replacement priority among the N cores.
10、 根据权利要求 9所述的控制器, 其特征在于, 所述控制器包括: 分配模块,用于根据性能目标为所述 N个内核分配各自的目标緩存占用量, 所述性能目标包括整体命中率最大化、 公平性或者服务质量中的至少一种; 第二获取模块, 用于获取替换优先级最高的内核的实际緩存占用量; 检测模块, 用于检测所述实际緩存占用量是否不大于所述目标緩存占用量; 控制模块, 用于若检测结果为所述实际緩存占用量不大于所述目标緩存占 用量, 则控制所述优先级计算单元重新计算所述 N个内核各自的替换优先级。 10. The controller according to claim 9, characterized in that, the controller includes: an allocation module, configured to allocate respective target cache occupancy amounts to the N cores according to performance targets, and the performance targets include the overall At least one of hit rate maximization, fairness or quality of service; the second acquisition module is used to obtain the actual cache occupancy of the core with the highest replacement priority; A detection module, used to detect whether the actual cache occupancy is not greater than the target cache occupancy; a control module, used to control the actual cache occupancy if the detection result is that the actual cache occupancy is not greater than the target cache occupancy The priority calculation unit recalculates the replacement priorities of each of the N cores.
11、 根据权利要求 9所述的控制器, 其特征在于, 所述确定模块, 包括: 获取单元, 用于从所述状态寄存器获取替换优先级最高的内核当前占用的 各个緩存块各自对应的第二访问信息; 11. The controller according to claim 9, characterized in that the determination module includes: an acquisition unit, configured to acquire from the status register the corresponding cache blocks currently occupied by the core with the highest replacement priority. 2. Access information;
确定单元, 用于根据所述替换优先级最高的内核当前占用的各个緩存块各 自对应的所述第二访问信息确定所述待替换的緩存块; A determination unit configured to determine the cache block to be replaced based on the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority;
其中, 所述状态寄存器还用于记录所述共享緩存中的各个緩存块各自对应 的所述第二访问信息,所述第二访问信息包括被所述 N个内核分别占用的次数。 Wherein, the status register is also used to record the second access information corresponding to each cache block in the shared cache, where the second access information includes the number of times occupied by the N cores.
12、 根据权利要求 11所述的控制器, 其特征在于, 所述确定单元, 包括: 第一确定子单元, 用于从所述替换优先级最高的内核当前占用的緩存块中 确定第一类型緩存块, 所述第一类型緩存块为被所述替换优先级最高的内核占 用的次数最少的緩存块; 12. The controller according to claim 11, wherein the determination unit includes: a first determination subunit, configured to determine the first type from the cache block currently occupied by the core with the highest replacement priority. Cache block, the first type cache block is the cache block occupied the least number of times by the core with the highest replacement priority;
第二确定子单元, 用于从所述第一类型緩存块中确定第二类型緩存块, 所 述第二类型緩存块为被所述 N个内核占用的总次数最少的緩存块; The second determination subunit is used to determine a second type of cache block from the first type of cache block. The second type of cache block is the cache block that has been occupied by the N cores the least number of times;
第三确定子单元, 用于根据替换算法从所述第二类型緩存块中确定所述待 替换的緩存块。 A third determination subunit, configured to determine the cache block to be replaced from the second type cache block according to a replacement algorithm.
PCT/CN2014/073052 2014-03-07 2014-03-07 Cache, shared cache management method and controller WO2015131395A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201480000331.3A CN105359116B (en) 2014-03-07 2014-03-07 Buffer, shared cache management method and controller
PCT/CN2014/073052 WO2015131395A1 (en) 2014-03-07 2014-03-07 Cache, shared cache management method and controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/073052 WO2015131395A1 (en) 2014-03-07 2014-03-07 Cache, shared cache management method and controller

Publications (1)

Publication Number Publication Date
WO2015131395A1 true WO2015131395A1 (en) 2015-09-11

Family

ID=54054398

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/073052 WO2015131395A1 (en) 2014-03-07 2014-03-07 Cache, shared cache management method and controller

Country Status (2)

Country Link
CN (1) CN105359116B (en)
WO (1) WO2015131395A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210342461A1 (en) * 2017-09-12 2021-11-04 Sophos Limited Providing process data to a data recorder

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108614782B (en) * 2018-04-28 2020-05-01 深圳市华阳国际工程造价咨询有限公司 Cache access method for data processing system
CN113505087B (en) * 2021-06-29 2023-08-22 中国科学院计算技术研究所 Cache dynamic dividing method and system considering service quality and utilization rate

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1804816A (en) * 2004-12-29 2006-07-19 英特尔公司 Method for programmer-controlled cache line eviction policy
CN101739299A (en) * 2009-12-18 2010-06-16 北京工业大学 Method for dynamically and fairly partitioning shared cache based on chip multiprocessor
CN101916230A (en) * 2010-08-11 2010-12-15 中国科学技术大学苏州研究院 Partitioning and thread-aware based performance optimization method of last level cache (LLC)
CN103150266A (en) * 2013-02-20 2013-06-12 北京工业大学 Improved multi-core shared cache replacing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0516474D0 (en) * 2005-08-10 2005-09-14 Symbian Software Ltd Pre-emptible context switching in a computing device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1804816A (en) * 2004-12-29 2006-07-19 英特尔公司 Method for programmer-controlled cache line eviction policy
CN101739299A (en) * 2009-12-18 2010-06-16 北京工业大学 Method for dynamically and fairly partitioning shared cache based on chip multiprocessor
CN101916230A (en) * 2010-08-11 2010-12-15 中国科学技术大学苏州研究院 Partitioning and thread-aware based performance optimization method of last level cache (LLC)
CN103150266A (en) * 2013-02-20 2013-06-12 北京工业大学 Improved multi-core shared cache replacing method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210342461A1 (en) * 2017-09-12 2021-11-04 Sophos Limited Providing process data to a data recorder
US11620396B2 (en) * 2017-09-12 2023-04-04 Sophos Limited Secure firewall configurations
US11966482B2 (en) 2017-09-12 2024-04-23 Sophos Limited Managing untyped network traffic flows

Also Published As

Publication number Publication date
CN105359116A (en) 2016-02-24
CN105359116B (en) 2018-10-19

Similar Documents

Publication Publication Date Title
US9977623B2 (en) Detection of a sequential command stream
US9639466B2 (en) Control mechanism for fine-tuned cache to backing-store synchronization
TW200809608A (en) Method and apparatus for tracking command order dependencies
US20140089592A1 (en) System cache with speculative read engine
US7640399B1 (en) Mostly exclusive shared cache management policies
US10025504B2 (en) Information processing method, information processing apparatus and non-transitory computer readable medium
US9135177B2 (en) Scheme to escalate requests with address conflicts
CN105095116A (en) Cache replacing method, cache controller and processor
US8583873B2 (en) Multiport data cache apparatus and method of controlling the same
US20170091099A1 (en) Memory controller for multi-level system memory having sectored cache
US9043570B2 (en) System cache with quota-based control
CN104951239B (en) Cache driver, host bus adaptor and its method used
WO2014075428A1 (en) Method and device for replacing data in cache module
US8151058B2 (en) Vector computer system with cache memory and operation method thereof
WO2016015583A1 (en) Memory management method and device, and memory controller
CN107592927A (en) Management sector cache
WO2015131395A1 (en) Cache, shared cache management method and controller
US9727474B2 (en) Texture cache memory system of non-blocking for texture mapping pipeline and operation method of texture cache memory
US10956339B2 (en) System and method for storing cache location information for cache entry transfer
US8886886B2 (en) System cache with sticky removal engine
JP5699854B2 (en) Storage control system and method, replacement method and method
JP2015191604A (en) Control device, control program, and control method
US20050044321A1 (en) Method and system for multiprocess cache management
CN108920192B (en) Cache data consistency implementation method and device based on distributed limited directory
JP7264806B2 (en) Systems and methods for identifying the pendency of memory access requests in cache entries

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201480000331.3

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14884368

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14884368

Country of ref document: EP

Kind code of ref document: A1