WO2015131395A1

WO2015131395A1 - Cache, shared cache management method and controller

Info

Publication number: WO2015131395A1
Application number: PCT/CN2014/073052
Authority: WO
Inventors: 郑礼炳; 李景超
Original assignee: 华为技术有限公司
Priority date: 2014-03-07
Filing date: 2014-03-07
Publication date: 2015-09-11
Also published as: CN105359116A; CN105359116B

Abstract

Provided are a cache, a shared cache management method and a controller, which relate to the field of computers. The cache comprises a cache unit, a state register, a priority calculation unit and a controller, wherein the state register is used for recording first access information about N kernels for respectively accessing the cache unit; the priority calculation unit is used for calculating the respective replacement priorities of the N kernels according to the first access information; and the controller is used for determining a cache block to be replaced in the cache blocks in a shared cache currently occupied by the kernel which has the highest replacement priority. The respective replacement priorities of the N kernels are calculated according to the first access information recorded by the state register, and the cache block to be replaced in the shared cache is determined according to the replacement priorities, thereby solving the problem in the prior art that a kernel can only determine a cache block to be replaced in a part of the corresponding cache blocks, so that the effect of improving the utilization rate and system performance of the shared cache is achieved.

Description

Buffer, shared cache management method and controller

The present invention relates to the field of computers, and in particular, to a buffer, a shared cache management method, and a controller. Background technique

With the continuous development of the computer field, the application of multi-core processors is more and more widely, and the effective management of the shared cache will have an important impact on the performance of the system.

In the existing shared cache management scheme, a part of the cache block in the shared cache is usually allocated to each core in the multi-core system, and one of the cores in the multi-core system is in the access cache (read or write) miss and cached. During the block refill operation, the cache block to be replaced is determined in the cache block corresponding to the kernel, and the original data in the cache block to be replaced is replaced with the data to be read or to be written.

In the process of implementing the present invention, the inventor has found that at least the following problems exist in the prior art: In the existing shared cache management scheme, the kernel can only determine the cache block to be replaced from the corresponding part of the cache block, which is very practical in practical applications. It may happen that some cache blocks corresponding to the kernel are frequently reused, and cache blocks corresponding to other cores are idle for a long time, resulting in low utilization of the shared cache and affecting system performance. Summary of the invention

The present invention provides a buffer, a shared cache management method, and a control method, in order to solve the problem that the cache is not high in the prior art, and the cache is not determined by the kernel. Device. The technical solution is as follows:

In a first aspect, a buffer is provided, where the buffer includes:

a cache unit, a status register, a priority calculation unit, and a controller;

The buffer unit is respectively connected to the status register and the controller; the status register is respectively connected to the buffer unit and the priority calculation unit; the priority calculation unit and the status register are respectively The controller is connected; the cache unit includes a shared cache and N shadow labels, and the N shadow labels respectively correspond to N cores of the processor; N > 2, and N Is an integer;

The status register is configured to record first access information of the N cores to the cache unit, where the first access information includes: accessing the shared cache, occupying a cache block in the shared cache The number of times, the number of times the access to the shared cache is hit, and the number of times the shadow tag was accessed and hit;

The priority calculation unit is configured to calculate, according to the first access information of the cache unit, the replacement priorities of the N cores, respectively, according to the N cores recorded in the status register; The priority of the cache block occupied by the corresponding kernel is replaced;

The controller, configured to acquire, from the priority calculation unit, a replacement priority of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and from the N The cache block to be replaced is determined in the cache block of the shared cache currently occupied by the kernel with the highest priority.

In a first possible implementation manner of the first aspect, the controller is further used to

Each of the N cores is assigned a respective target cache occupancy according to a performance goal, the performance target including at least one of overall hit rate maximization, fairness, or quality of service;

Get the actual cache footprint of the kernel with the highest priority;

Detecting whether the actual cache occupancy is not greater than the target cache occupancy;

And if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, controlling the priority calculation unit to recalculate respective replacement priorities of the N cores.

In a second possible implementation manner of the first aspect, the status register is further connected to the controller;

The status register is further configured to record a second access information corresponding to each cache block in the shared cache, where the second access information includes a number of times occupied by the N cores respectively;

The controller is configured to: when the accessing the shared cache miss occurs and perform a cache block refill operation, obtain, from the status register, the second corresponding to each cache block currently occupied by a kernel with the highest replacement priority Accessing the information, and determining the cache block to be replaced according to the second access information corresponding to each of the cache blocks currently occupied by the core with the highest priority.

In conjunction with the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the controller,

Determining a first type of cache block from a cache block currently occupied by the core with the highest replacement priority, the first type of cache block being a cache with the least number of times of the core with the highest replacement priority Piece;

Determining, from the first type of cache block, a second type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores;

The cache block to be replaced is determined from the second type cache block according to a replacement algorithm.

The second aspect provides a shared cache management method, which is used in the buffer according to any of the foregoing first aspect or the first aspect, wherein the method includes: When the shared cache misses and performs a cache block refill operation, the respective replacement priorities of the N cores of the processor are obtained from the priority calculation unit; the replacement priority is used to represent the cache occupied by the corresponding kernel. The priority of the block being replaced;

A cache block to be replaced is determined from the cached blocks of the shared cache currently occupied by the highest priority kernel.

In a first possible implementation manner of the second aspect, the method includes:

Get the actual cache footprint of the kernel with the highest priority;

In a second possible implementation manner of the second aspect, the status register is further configured to record second access information corresponding to each cache block in the shared cache, where the second access information includes the N The number of times the kernels are respectively occupied; the cache block to be replaced is determined from the cache blocks of the shared cache currently occupied by the kernel with the highest priority among the N cores, including:

Obtaining, from the status register, the second access information corresponding to each cache block currently occupied by the core with the highest priority;

Determining the cache block to be replaced according to the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority.

With reference to the second possible implementation of the second aspect, in a third possible implementation manner of the second aspect, the second access corresponding to each cache block currently occupied by the core with the highest replacement priority is respectively The information determines the cache block to be replaced, including:

The third aspect provides a controller, which is used in the buffer according to the foregoing first aspect or any possible implementation manner of the first aspect, wherein the controller includes:

a first obtaining module, configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation; Level is used to characterize the priority of the cache block occupied by the corresponding kernel being replaced;

And a determining module, configured to determine, from the N cores, a cache block to be replaced from a cache block of a shared cache currently occupied by a kernel with the highest priority.

In a first possible implementation manner of the third aspect, the controller includes:

An allocation module, configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of overall hit rate maximization, fairness, or quality of service; The detecting module is configured to detect whether the actual cache occupancy is not greater than the target cache occupancy;

And a control module, configured to: if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, control the priority calculation unit to recalculate respective replacement priorities of the N cores.

In a second possible implementation manner of the third aspect, the determining module includes:

And an obtaining unit, configured to acquire, from the status register, second access information corresponding to each cache block currently occupied by the core with the highest replacement priority;

a determining unit, configured to determine, according to the second access information corresponding to each cache block that is currently occupied by the kernel with the highest replacement priority, the cache block to be replaced;

The status register is further configured to record the second access information corresponding to each cache block in the shared cache, where the second access information includes a number of times occupied by the N cores.

With reference to the second possible implementation of the third aspect, in a third possible implementation manner of the third aspect, the determining unit includes:

a first determining subunit, configured to determine a first type of cache block from a cache block currently occupied by the core with the highest replacement priority, where the first type cache block is occupied by a kernel with the highest replacement priority The least used cache block;

a second determining subunit, configured to determine a second type of cache block from the first type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores;

And a third determining subunit, configured to determine, according to the replacement algorithm, the cache block to be replaced from the second type cache block.

The beneficial effects of the technical solutions provided by the embodiments of the present invention are:

The first access information of each of the N cores to the cache unit is recorded by the status register, and the priority calculation unit calculates the replacement priority of each of the N cores according to the first access information recorded by the status register, and the controller preferentially replaces each of the N cores according to the replacement status of the N cores. The level of the cache block to be replaced in the shared cache is determined, which solves the problem that the kernel can only determine the cache cache to be replaced from the corresponding part of the cache block, and the utilization of the shared cache is not high, thereby improving the utilization of the shared cache. The effect of rate and system performance. DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.

1 is a schematic structural diagram of a buffer provided by an embodiment of the present invention;

2 is a schematic structural diagram of a buffer according to another embodiment of the present invention;

3 is a schematic structural diagram of a cache unit according to another embodiment of the present invention;

4 is a schematic structural diagram of a status register according to another embodiment of the present invention;

5 is a flowchart of a method for a shared cache management method according to an embodiment of the present invention; FIG. 6 is a flowchart of a method for a shared cache management method according to another embodiment of the present invention; FIG. 7 is a flowchart of an embodiment of the present invention. Schematic diagram of the device;

FIG. 8 is a schematic structural diagram of a controller according to another embodiment of the present invention. detailed description

The embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.

Please refer to FIG. 1 , which is a schematic structural diagram of a buffer provided by an embodiment of the present invention. This buffer can be applied to multi-core systems. The buffer may include: a cache unit 102, a status register The device 104, the priority calculation unit 106 and the controller 108;

The buffer unit 102 is respectively connected to the status register 104 and the controller 108; the status register 104 is respectively connected to the buffer unit 102 and the priority calculation unit 106; the priority calculation unit 106 Connected to the status register 104 and the controller 108 respectively; the cache unit 102 includes a shared cache 1022 and N shadow labels 1024, respectively, corresponding to N cores of the processor; N > 2, and N is an integer;

The status register 104 is configured to record first access information of the N cores to the cache unit 102, where the first access information includes: accessing the shared cache 1022, occupying the shared cache 1022 The number of cache blocks in the cache, the number of hits to the shared cache 1022 and hits, and the number of hits to the shadow tag 1024 and hits;

The priority calculation unit 106 is configured to calculate, according to the first access information of the cache unit 102, the replacement priorities of the N cores according to the N cores recorded by the status register 104; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel being replaced;

The controller 108 is configured to acquire, from the priority calculation unit, a replacement priority of each of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and The cache block to be replaced is determined among the N cores in the cache block of the shared cache currently occupied by the core with the highest priority.

In this embodiment, the status register records the access information of each core to the shared cache, and the priority calculation unit calculates the replacement priority of each core according to the access information of each core to the shared cache, and the controller performs the cache block refill operation. The cache block to be replaced is determined according to the replacement priority of each core, and the cache block to be replaced can be determined according to the actual access situation of each core to the shared cache, thereby improving the utilization of the shared cache.

In summary, the buffer provided by the embodiment of the present invention records the first access information of each of the N cores to the cache unit through the status register, and the priority calculation unit calculates the respective N cores according to the first access information recorded in the status register. Replacing the priority, the controller determines the cache block to be replaced in the shared cache according to the replacement priority of each of the N cores, which solves the problem that the kernel can only determine the cache block to be replaced from the corresponding part of the cache block in the prior art. The problem of low shared cache utilization is achieved by improving shared cache utilization and system performance. Please refer to FIG. 2, which is a schematic structural diagram of a buffer provided by another embodiment of the present invention. The Buffers can be applied to multi-core systems. The buffer may include: a cache unit 202, a status register 204, a priority calculation unit 206, and a controller 208;

The buffer unit 202 is respectively connected to the status register 204 and the controller 208; the status register 204 is respectively connected to the buffer unit 202 and the priority calculation unit 206; the priority calculation unit 206 Connected to the status register 204 and the controller 208, respectively; the cache unit 202 includes a shared cache 2022 and N shaded labels 2024, respectively, corresponding to N cores of the processor; N > 2, and N is an integer;

For example, the processor includes four cores. For example, refer to the schematic diagram of the cache unit shown in FIG. 3, where the cache unit includes a shared cache and four shadow labels, and the shared cache includes n cache blocks. Each shadow label also contains n storage units, and each cache block in the shared cache corresponds to one of each of the shadow labels. Each cache block in the shared cache is divided into four parts: the kernel ID (identity, identification number), valid identification, tag information, and data currently occupying the cache block; each storage unit in the shadow tag includes two parts : Valid identification and label information. When a kernel in the processor accesses the shared cache and the shadow tag in the cache unit, the address of the corresponding memory block and the storage unit of each shadow tag is determined by the lower address of the 64-bit address, and the valid address is extracted from the determined address. The identification and label information, if the extracted valid identifier is valid, and the label information matches the information contained in the upper address of the 64-bit address, the current access hit is determined, otherwise, the current access miss is determined.

The status register 204 is configured to record first access information of the N cores to the cache unit 202, where the first access information includes: accessing the shared cache 2022, occupying the shared cache 2022 The number of cache blocks in the access buffer, access to the shared cache 2022 and the hit time when the kernel accesses the shared cache, the status register can record the kernel's access hits to the shared cache and shadow tags by the counter in real time, in the state shown in FIG. For example, the configuration of the register can maintain four counters for each core, which are the access counter (CNT_Ac), the buffer occupancy counter (CNT_Size), the cache hit counter (Cnt_Shared Hit), and The tag hit counter (Cnt_Shadow Hit), where CNT_Ac corresponds to the number of times a certain kernel accesses the shared cache, CNT_Size corresponds to the number of cache blocks occupied by a certain kernel in the shared cache, and Cnt_Shared Hit corresponds to a certain kernel. The number of times the access cache is hit and hit, Cnt-Shadow Hit corresponds to a kernel access shadow The number of hits and hits. Take the kernel i access shared cache and shadow tags as an example. The update status of each counter is as follows: When the processor core i accesses the shared cache (read operation or write operation), if the shared cache is hit, Cnt_Ac(i) and Cnt_Shared Hit(i) are incremented by 1. If the shadow label is hit, Cnt — Shadow Hit(i) is incremented by one; if the shared cache is missed, Cnt_Ac(i) and Cnt_Size(i) are incremented by 1 when there is a free cache block. If there is no free cache block, then The Cnt_Ac(i) counter is incremented by 1, and the kernel A with the highest priority is replaced. If the kernel A is not the kernel i, the cache block to be replaced is determined from the cache block occupied by the kernel A and Write Back is executed. Operation, and decrement the value of Cnt_Size(A) by 1. If kernel A is kernel i, the value of Cnt_Size(A) remains unchanged; if the shadow label is missing, Cnt- Shadow Hit(i) constant.

The priority calculation unit 206 is configured to calculate, according to the first access information of the cache unit 202, the replacement priorities of the N cores according to the N cores recorded by the status register 204; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel being replaced;

The priority calculation unit 206 may calculate the replacement probability of each kernel according to the first access information and the replacement probability calculation model of each of the above-mentioned kernels, and form a probability distribution, wherein the core with the highest replacement probability has the highest replacement priority.

Wherein, the replacement probability calculation model is as follows:

Assume that the number of access misses in any interval is W. And within these W access misses, the proportion of kernel i is Mi. Then, within this interval, the number of access misses in kernel i is Mi xW. If the cache block occupied by kernel i at the beginning of this interval is not replaced within this interval, at the end of the interval, the cache occupancy ratio of kernel i will become (Ci+(MiX W/m)), where m is The total number of cache blocks; Q is the ratio of the number of cache blocks occupied by kernel i at the beginning of this interval.

Assuming that the cache block replacement probability for this W access miss is, then there are XW cache blocks that are replaced within the interval. This is equivalent to the kernel i's cache occupancy ratio reduced by XW/m, so the kernel i's cache occupancy ratio after this interval is τ i = Οι + ((Μ _Γ Ε W/m). Since the minimum allocation unit is the cache block Therefore, compared with Q, the minimum change amount of ^ is 1/m (where Q and the meaning are percentages).

Use Τ to indicate that the kernel i reaches a given performance goal and the expected target is Τι=τι=Οι+((Μι-Ει) W/m). The expression that can be derived from the substitution probability is as follows:

0, if ( (Q-T xm/W + M;) <0

E, 1, if ( (Q-T^xm/W + M;) >1

(C. -T _; )xm/W + M _; ) , other The controller 208 is configured to acquire, from the priority calculation unit, a replacement priority of the N cores when the access to the shared cache miss occurs and perform a cache block refill operation, and The cache block to be replaced is determined among the N cores in the cache block of the shared cache currently occupied by the core with the highest priority.

The controller may determine the cache block to be replaced from the cache block of the shared cache currently occupied by the core with the highest priority replacement according to the LRU (Least Recently Used) replacement algorithm. Alternatively, the controller may determine the cache block to be replaced according to other replacement algorithms, which is not specifically limited in this embodiment.

The controller 208 is further configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of overall hit ratio maximization, fairness, or quality of service; The actual cache occupancy of the kernel with the highest priority; detecting whether the actual cache occupancy is greater than the target cache occupancy; if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, then controlling the The priority calculation unit recalculates the respective replacement priorities of the N cores.

In order to further improve the utilization of the shared cache, it is necessary to update the replacement priority of each core according to the access of each kernel to the shared cache. Specifically, the controller may first allocate a target cache occupancy amount to each core according to performance targets (such as overall hit rate maximization, fairness, or quality of service, etc.), and the controller further acquires the first access information from the cache unit, and from the first The actual cache occupancy of the kernel with the highest priority is obtained in the access information (the actual cache occupancy of the kernel with the highest priority substitution can be determined according to the number of cache blocks in the shared cache occupied by the kernel with the highest priority). If the actual cache occupancy of the kernel with the highest priority is less than or equal to the target cache occupancy of the kernel with the highest priority, the number of cache buffers of the shared cache occupied by the kernel with the highest replacement priority is insufficient. If the cache block to be replaced is determined by the cache block currently occupied by the highest priority kernel, the problem that the shared cache utilization is not high may occur. At this time, the controller may send a control instruction to the priority calculation unit to control the priority. The calculation unit recalculates the corresponding replacement of the N cores priority.

The status register 204 is further connected to the controller 208. The status register 204 is further configured to record second access information corresponding to each cache block in the shared cache 2022, where the second access information includes The number of times the N cores are occupied respectively;

The controller 208 is configured to: when the accessing the shared cache miss occurs and perform a cache block refill operation, obtain, from the status register 204, each of the currently occupied cores with the highest priority And storing the second access information corresponding to each of the cache blocks, and determining the cache block to be replaced according to the second access information corresponding to each of the cache blocks currently occupied by the kernel with the highest replacement priority.

The controller 208 is specifically configured to determine, according to a buffer block currently occupied by the core with the highest replacement priority, the first type cache block, where the first type cache block is the core with the highest replacement priority. a cache block having the least number of occupations; determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; Determining the cache block to be replaced from the second type of cache block.

In order to further improve the utilization of each cache block in the shared cache, each cache block is used on average to avoid the reused locality of the cache block. This embodiment also determines the cache block to be replaced in combination with the number of times each cache block is occupied. Specifically, the status register records the number of times each buffer block in the shared cache is occupied by each core as the second access information corresponding to each cache block, and when the subsequent kernel accesses the shared cache miss and performs the cache block refill, first Determining, from the cache block currently occupied by the kernel with the highest priority, the cache block with the least number of cores with the highest replacement priority, and determining that it is occupied by all cores from the cache block with the least number of cores with the highest replacement priority. The cache block with the least total number of times, and finally the cache block to be replaced is determined from the cache block with the least total number of times occupied by all the kernels according to a replacement algorithm such as LUR. The above method can tend to select the cache block with the least number of reuses, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores.

In summary, the buffer provided by the embodiment of the present invention records the first access information of each of the N cores to the cache unit through the status register, and the priority calculation unit calculates the respective N cores according to the first access information recorded in the status register. Replacing the priority, the controller determines the cache block to be replaced in the shared cache according to the replacement priority of each of the N cores, which solves the problem that the kernel can only determine the cache block to be replaced from the corresponding part of the cache block in the prior art. The problem of low shared cache utilization is achieved by improving shared cache utilization and system performance.

In addition, the buffer provided by the embodiment of the present invention detects the size relationship between the actual cache occupancy of the kernel with the highest priority and the target cache occupancy of the kernel with the highest priority, and the actual cache occupancy is not greater than the target. When the cache occupancy is exceeded, the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.

Finally, the buffer provided by the embodiment of the present invention determines the first type of cache block by using the cache block currently occupied by the kernel with the highest replacement priority, and the first type cache block is the least occupied by the kernel with the highest priority. a cache block; determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; The cache block to be replaced is determined in the type cache block, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, thereby further improving the utilization of each cache block in the shared cache. Please refer to FIG. 5, which is a flowchart of a method for sharing a cache management method according to an embodiment of the present invention. The method can control access to the kernel in a buffer as shown in FIG. 1 or 2. The shared cache management method can include:

Step 302: Acquire a replacement priority of each of the N cores of the processor from the priority calculation unit when the access to the shared cache miss occurs and perform a cache block refill operation; the replacement priority is used to represent the corresponding kernel The priority of the occupied cache block being replaced;

Step 304: Determine, from the N cores, a cache block to be replaced from a cache block of a shared cache currently occupied by a kernel with the highest priority.

In summary, the shared cache management method provided by the embodiment of the present invention determines the replacement priority of the N cores from the priority calculation unit, and determines the cache block to be replaced in the shared cache according to the replacement priorities of the N cores. In the prior art, the problem that the kernel cache can only determine the cache cache to be replaced from the corresponding part of the cache block is not high, and the effect of improving the shared cache utilization and system performance is achieved. Please refer to FIG. 6, which is a flowchart of a method for sharing a cache management method according to another embodiment of the present invention. This method can control access to the kernel in a buffer as shown in Figure 1 or Figure 2. The shared cache management method can include:

Step 402: Obtain a replacement priority of each of the N cores of the processor from the priority computing unit when accessing the shared cache miss occurs and performing a cache block refill operation;

The replacement priority is used to indicate the priority of the cache block occupied by the corresponding kernel to be replaced.

The replacement priority is calculated by the priority calculation unit according to the first access information corresponding to each core recorded in the status register, the specific process of recording the first access information by the status register, and the priority calculation unit calculating the priority of each core. For the process of the process, please refer to the description in the embodiment corresponding to FIG. 2, and details are not described herein again.

Step 404: Obtain, from the status register, the second access information corresponding to each cache block currently occupied by the core with the highest priority;

The status register is also used to record the second visit corresponding to each cache block in the shared cache. The information is information, and the second access information includes the number of times occupied by the N cores.

Step 406: Determine, according to the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority, the cache block to be replaced.

The controller may determine, from a cache block currently occupied by the core with the highest replacement priority, the first type cache block being the cache block with the least number of times occupied by the kernel with the highest replacement priority; Determining a second type of cache block in a type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; determining a cache block to be replaced from the second type of cache block according to a replacement algorithm .

In order to further improve the utilization of each cache block in the shared cache, each cache block is used on average to avoid the reused locality of the cache block. This embodiment also determines the cache block to be replaced in combination with the number of times each cache block is occupied. Specifically, the status register records the number of times each cache block in the shared cache is occupied by each core as the second access information corresponding to each cache block. When the subsequent kernel accesses the shared cache miss and performs the cache block refill, the first The cache block currently occupied by the highest priority kernel is determined to be the cache block with the least number of cores occupied by the highest priority, and the cache block that has the least number of kernels with the highest priority is determined to be occupied by all the cores. The cache block with the least total number of times, and finally the cache block to be replaced is determined from the cache block with the least total number of times occupied by all the kernels according to a replacement algorithm such as LUR.

In addition, the controller allocates a respective target cache occupancy for the N cores according to performance goals, the performance target including at least one of overall hit rate maximization, fairness, or quality of service; obtaining a kernel with the highest replacement priority The actual cache occupancy is detected; if the actual cache occupancy is not greater than the target cache occupancy; if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, then the priority calculation unit is controlled to recalculate the N cores. Respective replacement priorities.

In order to further improve the utilization of the shared cache, it is necessary to update the replacement priority of each core according to the access of each kernel to the shared cache. Specifically, the controller may first allocate a target cache occupancy amount to each core according to performance targets (such as overall hit rate maximization, fairness, or quality of service, etc.), and the controller further acquires the first access information from the cache unit, and from the first The actual cache occupancy of the kernel with the highest priority is obtained in the access information (the actual cache occupancy of the kernel with the highest priority substitution can be determined according to the number of cache blocks in the shared cache occupied by the kernel with the highest priority). If the actual cache occupancy of the kernel with the highest priority is less than or equal to the target cache occupancy of the kernel with the highest priority, the number of cache buffers of the shared cache occupied by the kernel with the highest replacement priority is insufficient. The cache block currently occupied by the highest priority kernel is determined to be replaced The swap cache block may have a problem that the shared cache utilization is not high. At this time, the controller may send a control instruction to the priority calculation unit to control the priority calculation unit to recalculate the replacement priority corresponding to each of the N cores. .

In summary, the shared cache management method provided by the embodiment of the present invention determines the replacement priority of the N cores from the priority calculation unit, and determines the cache block to be replaced in the shared cache according to the replacement priorities of the N cores. In the prior art, the problem that the kernel cache can only determine the cache cache to be replaced from the corresponding part of the cache block is not high, and the effect of improving the shared cache utilization and system performance is achieved.

In addition, the shared cache management method provided by the embodiment of the present invention detects the size relationship between the actual cache occupancy of the kernel with the highest priority and the target cache occupancy of the kernel with the highest priority, when the actual cache occupancy is not When the target cache occupancy is greater than, the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.

Finally, the shared cache management method provided by the embodiment of the present invention determines the first type of cache block by using the cache block currently occupied by the kernel with the highest priority, and the first type of cache block is the number of times the core with the highest priority is replaced. a minimum number of cache blocks; determining a second type of cache block from the first type of cache block, the second type of cache block being the least number of cache blocks occupied by the N cores; determining from the second type of cache block according to the replacement algorithm The replacement cache block takes into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, further improving the utilization of each cache block in the shared cache. Please refer to FIG. 7, which is a schematic structural diagram of a controller according to an embodiment of the present invention. The controller can control access to the kernel in a buffer as shown in Figure 1 or Figure 2. The controller can include:

a first obtaining module 501, configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel is replaced; the determining module 502 is configured to determine, from the N cores, the cache block of the shared cache currently occupied by the highest priority kernel to be replaced. Cache block.

In summary, the controller provided by the embodiment of the present invention obtains the replacement priorities of the N cores from the priority calculation unit, and determines the cache blocks to be replaced in the shared cache according to the replacement priorities of the N cores. In the prior art, the kernel can only determine that it is to be replaced from a corresponding part of the cache block. The problem of low utilization of the shared cache caused by the cache block achieves the effect of improving the shared cache utilization and system performance. Please refer to FIG. 8 , which is a schematic structural diagram of a controller according to another embodiment of the present invention. The controller can control access to the kernel in a buffer as shown in FIG. 1 or 2. The controller can include:

a first obtaining module 601, configured to acquire, from the priority computing unit, a replacement priority of each of the N cores of the processor when the accessing the shared cache miss occurs and performing a cache block refilling operation; The priority is used to indicate the priority of the cache block occupied by the corresponding kernel is replaced; the determining module 602 is configured to determine, from the N cores, the cache block of the shared cache currently occupied by the highest priority kernel to be replaced. Cache block.

The controller includes:

The allocating module 603 is configured to allocate a respective target cache occupancy to the N cores according to a performance target, where the performance target includes at least one of an overall hit rate maximization, fairness, or quality of service;

The second obtaining module 604 is configured to obtain an actual cache occupancy of the kernel with the highest priority; the detecting module 605 is configured to detect whether the actual cache occupancy is not greater than the target cache usage;

The control module 606 is configured to, if the detection result is that the actual cache occupancy is not greater than the target cache occupancy, control the priority calculation unit to recalculate respective replacement priorities of the N cores.

The determining module 602 includes:

The obtaining unit 6021 is configured to obtain, from the status register, second access information corresponding to each of the cache blocks currently occupied by the kernel with the highest priority;

The determining unit 6022 is configured to determine, according to the second access information corresponding to each cache block currently occupied by the kernel with the highest replacement priority, the cache block to be replaced;

The determining unit 6022 includes:

a first determining sub-unit 6022a, configured to use the current highest priority from the core with the highest replacement priority Determining a first type of cache block in the storage block, the first type of cache block being a cache block having the least number of times occupied by the core with the highest replacement priority;

a second determining sub-unit 6022b, configured to determine, from the first type of cache block, a second type of cache block, where the second type of cache block is a cache block having the least total number of times occupied by the N cores; The determining subunit 6022c is configured to determine, according to the replacement algorithm, the cache block to be replaced from the second type cache block.

In summary, the controller provided by the embodiment of the present invention obtains the replacement priorities of the N cores from the priority calculation unit, and determines the cache blocks to be replaced in the shared cache according to the replacement priorities of the N cores. In the prior art, the kernel can only determine the cache cache to be replaced from the corresponding part of the cache block, and the utilization of the shared cache is not high, thereby achieving the effect of improving the shared cache utilization and system performance.

In addition, the controller provided by the embodiment of the present invention detects the size relationship between the actual cache occupancy of the core with the highest priority and the target cache occupancy of the core with the highest priority, and the actual cache occupancy is not greater than the target. When the cache occupancy is exceeded, the control priority calculation unit recalculates the replacement priority of each core, thereby further improving the utilization of the shared cache.

Finally, the controller provided by the embodiment of the present invention determines the first type of cache block by using the cache block currently occupied by the core with the highest priority, and the first type of cache block is the least occupied by the core with the highest priority. a cache block; determining a second type of cache block from the first type of cache block, the second type of cache block being a cache block having the least total number of times occupied by the N cores; determining, to be replaced, from the second type of cache block according to a replacement algorithm Cache blocks, thereby taking into account the reuse locality of the cache block and the extent to which the cache block is shared by all cores, further improving the utilization of each cache block in the shared cache. A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like. The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims

claims

1. A cache, characterized in that the cache includes: a cache unit, a status register, a priority calculation unit and a controller;

The cache unit is connected to the status register and the controller respectively; the status register is connected to the cache unit and the priority calculation unit respectively; the priority calculation unit is connected to the status register and the priority calculation unit respectively. The controller is connected; the cache unit includes a shared cache and N shadow tags, the N shadow tags respectively correspond to N cores of the processor; N > 2, and N is an integer;

The status register is used to record the first access information of each of the N cores to the cache unit. The first access information includes: the number of times the shared cache is accessed, and the cache blocks occupied in the shared cache. The number of times, the number of times the shared cache is accessed and hit, and the number of times the shadow tag is accessed and hit;

The priority calculation unit is configured to calculate the replacement priority of each of the N cores according to the first access information of each of the N cores to the cache unit recorded in the status register; the replacement priority is used It represents the priority of replacing the cache block occupied by the corresponding core;

The controller is configured to obtain the replacement priorities of each of the N cores from the priority calculation unit when a miss occurs in accessing the shared cache and performs a cache block refill operation, and obtains the replacement priorities from the N cores. The cache block to be replaced is determined from the cache block of the shared cache currently occupied by the core with the highest replacement priority among the cores.

2. The cache according to claim 1, wherein the controller is further configured to allocate respective target cache occupancies to the N cores according to performance goals, the performance goals include maximum overall hit rate at least one of customization, fairness or service quality;

Get the actual cache occupancy of the core with the highest replacement priority;

Detect whether the actual cache occupancy is not greater than the target cache occupancy;

If the detection result is that the actual cache occupancy is not greater than the target cache occupancy, the priority calculation unit is controlled to recalculate the replacement priorities of each of the N cores.

3. The buffer according to claim 1, characterized in that, the status register is also connected to the controller; The status register is also used to record the second access information corresponding to each cache block in the shared cache, where the second access information includes the number of times occupied by the N cores;

The controller is configured to obtain, from the status register, the second corresponding cache blocks currently occupied by the core with the highest replacement priority when a miss occurs when accessing the shared cache and a cache block refill operation is performed. access information, and determine the cache block to be replaced based on the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority.

4. The cache according to claim 3, characterized in that, the controller is configured to determine the first type cache block from the cache block currently occupied by the core with the highest replacement priority, the first type cache The block is the cache block occupied the least number of times by the core with the highest replacement priority;

Determine a second type of cache block from the first type of cache block, where the second type of cache block is the cache block that has been occupied by the N cores the least number of times;

The cache block to be replaced is determined from the cache blocks of the second type according to a replacement algorithm.

5. A shared cache management method, used in the cache according to any one of the above claims 1 to 4, characterized in that the method includes:

When a miss occurs when accessing the shared cache and a cache block refill operation is performed, the replacement priorities of each of the N cores of the processor are obtained from the priority calculation unit; the replacement priorities are used to characterize the corresponding cores. The priority with which occupied cache blocks are replaced;

The cache block to be replaced is determined from the cache blocks of the shared cache currently occupied by the core with the highest priority among the N cores.

6. The method according to claim 5, characterized in that, the method includes:

Allocate respective target cache occupancies to the N cores according to performance goals, where the performance goals include at least one of maximization of overall hit rate, fairness or quality of service;

7. The method according to claim 5, characterized in that, the status register is also used to record the second access information corresponding to each cache block in the shared cache, the second access information includes the The number of times N cores are respectively occupied; Determining the cache block to be replaced from the cache blocks of the shared cache currently occupied by the core with the highest replacement priority among the N cores includes:

Obtain the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority from the status register;

The cache block to be replaced is determined according to the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority.

8. The method according to claim 7, wherein the cache block to be replaced is determined based on the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority, include:

Determine a first type cache block from the cache blocks currently occupied by the core with the highest replacement priority, where the first type cache block is the cache block occupied the least number of times by the core with the highest replacement priority;

9. A controller used in the buffer according to any one of the above claims 1 to 4, characterized in that the controller includes:

The first acquisition module is configured to acquire the replacement priorities of each of the N cores of the processor from the priority calculation unit when a miss occurs when accessing the shared cache and a cache block refill operation is performed; the replacement priority is The level is used to represent the priority of replacement of the cache block occupied by the corresponding core;

The determining module is configured to determine the cache block to be replaced from the cache block of the shared cache currently occupied by the core with the highest replacement priority among the N cores.

10. The controller according to claim 9, characterized in that, the controller includes: an allocation module, configured to allocate respective target cache occupancy amounts to the N cores according to performance targets, and the performance targets include the overall At least one of hit rate maximization, fairness or quality of service; the second acquisition module is used to obtain the actual cache occupancy of the core with the highest replacement priority; A detection module, used to detect whether the actual cache occupancy is not greater than the target cache occupancy; a control module, used to control the actual cache occupancy if the detection result is that the actual cache occupancy is not greater than the target cache occupancy The priority calculation unit recalculates the replacement priorities of each of the N cores.

11. The controller according to claim 9, characterized in that the determination module includes: an acquisition unit, configured to acquire from the status register the corresponding cache blocks currently occupied by the core with the highest replacement priority. 2. Access information;

A determination unit configured to determine the cache block to be replaced based on the second access information corresponding to each cache block currently occupied by the core with the highest replacement priority;

Wherein, the status register is also used to record the second access information corresponding to each cache block in the shared cache, where the second access information includes the number of times occupied by the N cores.

12. The controller according to claim 11, wherein the determination unit includes: a first determination subunit, configured to determine the first type from the cache block currently occupied by the core with the highest replacement priority. Cache block, the first type cache block is the cache block occupied the least number of times by the core with the highest replacement priority;

The second determination subunit is used to determine a second type of cache block from the first type of cache block. The second type of cache block is the cache block that has been occupied by the N cores the least number of times;

A third determination subunit, configured to determine the cache block to be replaced from the second type cache block according to a replacement algorithm.