WO2024066195A1

WO2024066195A1 - Cache management method and apparatus, cache apparatus, electronic apparatus, and medium

Info

Publication number: WO2024066195A1
Application number: PCT/CN2023/078664
Authority: WO
Inventors: 贾琳黎; 林江
Original assignee: 海光信息技术股份有限公司
Priority date: 2022-09-27
Filing date: 2023-02-28
Publication date: 2024-04-04
Also published as: CN115617709A

Abstract

A cache management method, a cache apparatus, a cache management apparatus, an electronic apparatus, and a computer-readable storage medium. The cache management method is used for a shared cache shared by a plurality of processor cores, the cache management method comprising: allocating a near memory bank and a far memory bank to each processor core; and for a memory access request of each processor core, first accessing the corresponding near memory bank, and then accessing the corresponding far memory bank. In the method, the shared cache is divided into the near memory banks and the far memory banks, thereby reducing the delay caused by the increased physical delay due to the size of the shared cache, and improving performance.

Description

Cache management method and device, cache device, electronic device and medium

This application claims priority to Chinese Patent Application No. 202211183443.X filed on September 27, 2022. The contents of the above-mentioned Chinese patent application disclosure are hereby cited in their entirety as a part of this application.

Technical Field

Embodiments of the present disclosure relate to a cache management method, a cache device, a cache management device, an electronic device, and a computer-readable storage medium.

Background technique

In the design of multi-core processors, memory access operations are a major factor affecting performance. To improve processor performance, cache technology is currently used to reduce latency and improve performance. However, due to the limitation of chip size, the capacity of the cache in the processor core is limited and can only meet the needs of some memory access operations. Therefore, people have proposed adding a larger cache outside the core as a storage unit shared between multiple processing cores, that is, a shared cache, to reduce memory access latency and improve performance.

Summary of the invention

At least one embodiment of the present disclosure provides a cache management method for a shared cache shared by multiple processor cores, the cache management method comprising: allocating a respective near memory bank and far memory bank to each processor core; for each processor core's memory access request, preferentially accessing the corresponding near memory bank, and then accessing the corresponding far memory bank.

At least one embodiment of the present disclosure also provides a cache device, including: a shared cache for sharing by multiple processor cores, the shared cache including multiple storage bodies, and a cache management unit configured to allocate a respective near storage body and a far storage body to each processor core, and to allow a memory access request to each processor core to preferentially access the corresponding near storage body, and then access the corresponding far storage body.

At least one embodiment of the present disclosure further provides a cache management device, including: a processor; and a memory storing computer executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by the processor.

At least one embodiment of the present disclosure further provides an electronic device, including a cache, a cache device provided by at least one embodiment of the present disclosure, and a plurality of processor cores.

At least one embodiment of the present disclosure further provides a computer-readable storage medium for non-transiently storing computer-executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by a processor.

The cache management method provided by the embodiment of the present disclosure divides the shared cache into a near storage body and a far storage body, which can reduce the delay caused by the increased physical delay due to the size of the shared cache and improve the performance.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below. Obviously, the drawings in the following description only relate to some embodiments of the present disclosure, rather than limiting the present disclosure.

FIG1 shows a schematic diagram of the structure of a multi-core processor system;

FIG2 is a schematic diagram showing the mapping relationship between memory and cache in direct associative, fully associative and set associative;

FIG3 is a schematic diagram showing a set-associative organization and addressing mode of a cache;

FIG4 shows a schematic flow chart of a cache management method provided by at least one embodiment of the present disclosure;

FIG5 is a schematic diagram showing a mapping relationship between a near memory bank and a far memory bank in a private cache and a shared cache according to an embodiment;

FIG. 6 is a schematic flow chart showing an example of step S402 in FIG. 4 ;

FIG. 7 is a schematic flow chart showing another example of step S402 in FIG. 4 ;

FIG8A shows a schematic flow chart of a cache management method for a read request;

FIG8B shows a schematic flow chart of a cache management method for write-back requests;

FIG8C is a schematic block diagram showing an example of cache line migration;

FIG9A shows a schematic block diagram of a cache device provided by at least one embodiment of the present disclosure;

FIG9B shows a schematic structural diagram of a cache device provided by at least one embodiment of the present disclosure;

FIG10 shows a schematic diagram of a cache management device according to an embodiment of the present disclosure; and

FIG. 11 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solution and advantages of the embodiments of the present disclosure clearer, the technical solution of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings of the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, not all of the embodiments. Based on the described embodiments of the present disclosure, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present disclosure.

Unless otherwise defined, the technical terms or scientific terms used in the present disclosure should be understood by people with ordinary skills in the field to which the present disclosure belongs. The "first", "second" and similar words used in the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Similarly, similar words such as "one", "one" or "the" do not indicate quantity restrictions, but indicate that there is at least one. Similar words such as "include" or "comprise" mean that the elements or objects appearing before the word cover the elements or objects listed after the word and their equivalents, without excluding other elements or objects. Similar words such as "connect" or "connected" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. "Up", "down", "left", "right" and the like are only used to indicate relative positional relationships. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.

FIG1 shows a multi-core processor system, which is a centralized shared memory system, where processing cores core0, core1, core2 and core3 have their own dedicated caches (private caches), one or more levels of shared caches (usually the last level cache (LLC)), and share the same main memory and input/output (I/O). The dedicated cache of each processing core may include a first level cache (L1 cache) or a second level cache (L2 cache), etc.

For example, the capacity of the cache is usually very small, the content stored in the cache is only a subset of the content of the main memory, and the data exchange between the cache and the main memory is in blocks. In order to cache the data in the main memory into the cache, a certain function must be applied to locate the main memory address to the cache, which is called address mapping. After the data in the main memory is cached into the cache according to this mapping relationship, when the central processing unit (CPU) executes the program, it will transform the main memory address in the program into the cache address. The address mapping methods of the cache usually include direct mapping, fully associative mapping, and set associative mapping.

Although the cache capacity is smaller than the main memory, its speed is much faster than the main memory. Therefore, the main function of the cache is to store data that the processor may need to access frequently in the near future. In this way, the processor can directly read data from the cache without frequently accessing the slower main memory, thereby improving the processor's access speed to the main memory. The basic unit of cache is a cache block or cache line. Similar to the division of cache into multiple cache lines, the data stored in the main memory is also similarly divided. The divided data blocks in the main memory are called main memory blocks. Generally, the size of a main memory block can be 4KB, and the size of a cache line can also be 4KB. It is understandable that in actual applications, the size of the main memory block and the cache line can also be set to other values, as long as the size of the main memory block is the same as the size of the cache line.

There is a certain mapping relationship between the main memory and the cache, which can be direct associative, fully associative, and set associative. In direct associative, fully associative, and set associative, the mapping relationship between the main memory and the cache is shown in Figure 2. Divide the main memory and the cache into blocks of the same size. Assume that the main memory has 32 items and the cache has 8 items. In the direct associative method, each main memory block can only be placed in a cache line in the cache. Assume that the 12th block of the main memory is to be placed in the cache. Since the cache has only 8 items, it can only be placed in the (12mod 8＝4)th item and cannot be placed anywhere else; it can be seen that the 4th, 12th, 20th, and 28th main memory blocks all correspond to the 4th item of the cache. If there is a conflict, they can only be replaced. The hardware required for the direct associative method is simple but inefficient, as shown in Figure 2 (a). In the fully associative method, each main memory block can be placed in any position of the cache, so that the 4th, 12th, 20th, and 28th main memory blocks can be placed in the cache at the same time. The hardware required for the fully associative method is complex but efficient, as shown in Figure 2(b). Set associativity is a compromise between direct associativity and full associativity. Taking two-way set associativity as an example, positions 0, 2, 4, and 6 in the cache are one way (here called way 0), and positions 1, 3, 5, and 7 are another way (here called way 1), with 4 blocks in each way. For block No. 12 in the main memory, because the remainder of 12 divided by 4 is 0, block No. 12 can be placed in position No. 0 of way 0 of the cache (i.e. position No. 0 of the cache), or in position No. 0 of way 1 (i.e. position No. 1 of the cache), as shown in Figure 2(c).

The set-associative organization and addressing mode of the cache in (c) of FIG2 can be further illustrated by the example of FIG3. As shown in FIG3, the cache is organized in the form of a cache line array. A column of cache lines constitutes the same way, and multiple cache lines at the same position in multiple columns of cache lines constitute a set, so the cache lines of the same set are in different ways, that is, they are ranged by different ways. The location of the data or instruction in the cache is obtained by the physical address of the data or instruction to be read, and each physical address (for example, it can include multiple bits, such as 32 bits, according to the specifications of the system) is divided into three parts:

● Index, used to select a group in the cache. All cache lines in the same group are indexed. Induce choice;

● Tag, used to select a specific cache line in a group. The tag of the physical address is compared with the tag of each cache line. If they match, it is a cache hit, thus selecting this cache line. Otherwise, it is a cache miss.

●Offset, which is used to select the corresponding address in the cache line. It indicates the first byte of the physical address in the cache line, and the corresponding data or instruction is read from the position of this byte.

The working principle of the cache requires it to store the latest or most frequently used data as much as possible. When a cache line is transferred from the main memory to the cache and the available positions in the cache are already occupied, the cache data replacement problem will arise. Solving this problem involves the data replacement mechanism of the cache system. In short, the data replacement mechanism of the cache system includes two steps:

First, filter out the data in the cache that is “unimportant” to application access;

Second, delete the data from the cache to make room for new data. For data with dirty attributes, it also needs to be written back to the main memory.

Existing replacement algorithms may include LRU (Least Recently Used), LFU (Least Frequently Used), MRU (Most Recently Used), NRU (Not Recent Used), SRRIP (Static RRIP), etc.

The cache includes a large number of storage cells, each of which is used to store a data bit. These storage cells are physically arranged in an array, and each storage cell is accessed through word lines and bit lines. All storage cells in each cache are divided and organized into multiple sub-arrays for easy access, and each sub-array is called a bank. For example, input buffers and output buffers can be provided for each bank to facilitate access (reading, writing, etc.); for example, different banks can also be accessed in parallel at the same time. For example, for the above-mentioned set-associative situation, multiple cache lines in the same way can be physically located in different banks.

For multi-core processors, when only a single core is working, all resources of the private cache and shared cache can be used. When the single core's memory access operation accesses the shared cache, if the address is mapped to the remote shared cache, the additional physical delay will reduce the performance improvement brought by the large capacity and reduce the performance of the single core. Similarly, when multiple cores are working, private cache and part of the shared cache resources can be used. If each core's memory access operation to the shared cache is mapped to a remote physical area (such as a storage body), these additional physical delays will also reduce the performance of the multiple cores. As the demand for high performance increases, the capacity of the cache outside the core is required to be larger, which also means a larger area for the shared cache. When the size When the size increases to a certain extent, the additional physical delay introduced will offset part of the performance improvement brought by the cache.

The cache management method provided by the above-mentioned embodiment of the present disclosure divides the shared cache into a near storage body and a far storage body, reduces the delay caused by the increased physical delay due to the size of the shared cache, and improves the performance.

At least one embodiment of the present disclosure also provides a cache device, a cache management device, an electronic device, and a computer-readable storage medium corresponding to the above-mentioned cache management method.

The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings, but the present disclosure is not limited to these specific embodiments.

Fig. 4 shows a schematic flow chart of a cache management method provided by at least one embodiment of the present disclosure. The cache management method is used for a shared cache shared by multiple processor cores, and the shared cache includes multiple storage bodies.

As shown in FIG. 4 , the cache management method includes the following steps S401 - S402 .

Step S401: Allocate a respective near memory bank and far memory bank to each processor core.

Step S402: For each processor core's memory access request, the corresponding near memory bank is accessed first, and then the corresponding far memory bank is accessed.

For example, the access latency of a processor core to a corresponding near memory bank is less than the access latency to a corresponding far memory bank. It should be noted that the "near" and "far" here refer to each processor core. Therefore, a memory bank in the cache that is a near memory bank (or far memory bank) for one processor core may not be a near memory bank (or far memory bank) for another processor core.

FIG. 5 is a schematic diagram showing a mapping relationship between a near memory bank and a far memory bank in a private cache and a shared cache according to an embodiment.

As shown in FIG5 , the shared cache includes multiple memory banks (near memory bank 0, far memory bank 0, far memory bank 1, etc.). Both the private cache and the shared cache have a “way-set” structure, and the cache lines in the private cache and the cache lines in the memory banks in the shared cache have the same size. For example, a cache line in a certain way in the same set in the private cache of a processor core may correspond to a shared cache. The shared cache may be a cache line in a certain way in different groups in different banks. For example, the first cache line in the same group and the same way in the private cache may correspond to a cache line in the near bank 0 and a cache line in the far bank 0. For another example, the second cache line in the same group and the same way in the private cache may correspond to a cache line in the near bank 0 and a cache line in the far bank 1. The embodiments of the present disclosure are not limited to the above exemplary correspondence.

For example, for each processor core's memory access request, the near memory bank is accessed first, and then the far memory bank is accessed. When a single processor core is working, both the near memory bank and the far memory bank can be accessed. When multiple processor cores are working, each processor core mainly accesses its own near memory bank.

For memory access requests of a single processor core, the latency of memory access is minimal and does not depend on the address mapping relationship. It is limited to ensure that all memory accesses of the processor core are concentrated in the near memory bank. For memory accesses of multiple processor cores, they are also concentrated in the near memory bank with the closest physical location. Whether it is a single processor core or multiple processor cores, latency can be reduced and performance can be improved.

FIG. 6 shows a schematic flow chart of an example of step S402 in FIG. 4 .

As shown in FIG. 6 , for a read request that is a memory access request, an example of step S402 may include the following steps S601 - S602 .

Step S601: operating the corresponding near memory bank according to the physical address of the read request.

For example, in some embodiments of the present disclosure, step S601 may include: when the read request hits in the corresponding near memory bank, returning the data in the near memory bank to the processor core.

For example, in some embodiments of the present disclosure, step S601 may further include: when the read request hits the corresponding near storage body, invalidating the copy in the corresponding near storage body, and updating the directory of the shared cache.

For example, the processor core sends a read request to the shared cache. According to the index information of the physical address of the read request, it corresponds to a near storage body. It accesses the near storage body and performs a search operation through the tag information of the physical address. The tag of the physical address is compared with the tag of each cache line in the near storage body. If they match, the cache hits, so this cache line is selected, and the data in this cache line is returned to the processor core. At the same time, the copy in the near storage body is invalidated, and then the directory in the shared cache is updated. The advantage of this is that since the resources of the near storage body are limited, the storage location in the near storage body can be given up to store other data.

Step S602: when the read request does not hit in the near memory bank, the read request is routed to the corresponding far memory bank according to the physical address of the read request, and the corresponding far memory bank is operated.

For example, if the tag of the physical address does not match the tags of all cache lines in the near memory bank, a cache miss occurs. At this time, the index information of the physical address corresponds to a far memory bank, the far memory bank is accessed, and a search operation is performed through the tag information of the physical address, and the access_farBank_flag (access far memory bank flag) is marked.

For example, in some embodiments of the present disclosure, step S602 may include: when the read request hits the corresponding far memory bank, returning the data in the corresponding far memory bank to the processor core.

For example, in some embodiments of the present disclosure, step S602 may also include: retaining a copy in a far storage body, adding aging information of a cache line corresponding to the data stored in a corresponding far storage body, and updating a directory of a shared cache.

For example, when a read request misses in the near memory and accesses the far memory, it is also necessary to perform a search operation through the tag information of the physical address. The tag of the physical address is compared with the tag of each cache line in the far memory. If they match, it is a cache hit, so this cache line is selected, the data in this cache line is returned to the processor core, and the copy in the far memory is retained, the aging information of this cache line stored in the far memory is increased, and the shared cache directory is updated. The reason for retaining the copy here is that this far memory may also be used as the near memory of other processor cores, and other processor cores have priority to use this far memory. When accessing this far memory, its storage information is not interfered with, and the storage information of this far memory as the near memory of a certain processor core is maintained.

For example, the cache management method provided by the embodiment of the present disclosure may further include step S603: when the read request does not hit in the corresponding near storage body and the corresponding far storage body, operating on other processor cores by checking the directory of the shared cache.

For example, in some embodiments of the present disclosure, step S603 may include: if the data to be found exists in other processor cores, returning the data in the other processor cores to the processor core that issued the read request, and updating the directory in the shared cache; if the data to be found does not exist in other processor cores, sending a read request to the memory to obtain the data to be found.

For example, when a read request does not hit the corresponding near storage body and the corresponding far storage body, it is possible to check the directory in the shared cache to determine whether other processor cores have the data requested by the read request. If the requested data exists in other processor cores, the data can be returned to the processor core that issued the read request through a core to core transfer method to update the directory in the shared cache. If the requested data is not available in other processor cores, the read request needs to be sent to the memory to obtain the requested data.

FIG. 7 is a schematic flowchart showing another example of step S402 in FIG. 4 .

As shown in FIG. 7 , for a write-back request in a memory access request, another example of step S402 may include the following steps S701 - S702 .

Step S701: operating the corresponding near memory bank according to the physical address of the write-back request.

For example, in some embodiments of the present disclosure, step S701 may include: in a case where the write-back request hits in the corresponding near storage body, updating the state stored in the corresponding near storage body; in a case where the write-back request does not hit in the corresponding near storage body, setting the first victim cache line in the corresponding near storage body through a replacement algorithm.

For example, the processor core issues a write-back request to the shared cache. According to the index information of the physical address of the write-back request, it corresponds to a near memory bank. The near memory bank is accessed and a search operation is performed through the tag information of the physical address. The tag of the physical address is compared with the tag of each cache line in the near memory bank. If a match occurs, the cache hits. This cache line is called the first victim cache line. The decision of which cache line is sacrificed is controlled by the replacement algorithm. In the case where the write-back request hits in the corresponding near memory bank, it means that the first victim cache line already exists in the near memory bank, so only the state stored in the near memory bank needs to be updated. In the case of a miss, the first victim cache line needs to be set in the corresponding near memory bank through the replacement algorithm.

For example, in some embodiments of the present disclosure, storing the first sacrifice cache line in the corresponding near storage body through a replacement algorithm may include: in a case where there is a free cache line in the corresponding near storage body to store the first sacrifice cache line, storing the first sacrifice cache line in the corresponding near storage body, and updating the directory of the shared cache; in a case where there is no free cache line in the corresponding near storage body to store the first sacrifice cache line, generating a second sacrifice cache line in the corresponding near storage body, performing a migration operation on the second sacrifice cache line, marking a flag indicating write back to the far storage body, sending the second sacrifice cache line and the aging information corresponding to the second sacrifice cache line to the corresponding far storage body, and updating the directory of the shared cache.

For example, when there is an idle cache line in the corresponding near storage body to store the first victim cache line, the first victim cache line is directly set in the near storage body, and the shared cache directory is updated. When there is no idle cache line in the corresponding near storage body to store the first victim cache line, a second victim cache line is generated in the near storage body, the first victim cache line is set at the second victim cache line, and the second victim cache line is migrated within the shared cache. The victim_far_flag signal (write back to far storage body flag) needs to be marked, and the second victim cache line and its corresponding aging information are sent together. To the remote storage body, and at the same time update the directory update caused by the write-back operation generated by the previous processor core. It should be noted that the parameter of the replacement algorithm is aging information for the least recently used algorithm (LRU) and is usage frequency information for the least frequently used algorithm (LFU). In the embodiment of the present disclosure, aging information is taken as an example, and the present disclosure does not limit this.

Step S702: when a cache line migration operation is generated in the corresponding near memory bank, an operation is performed on the corresponding far memory bank according to address information of a first victim cache line of the corresponding near memory bank.

For example, in some embodiments of the present disclosure, step S702 may include: in a case where a write-back request hits in a corresponding far storage body, updating the state of a cache line in the corresponding far storage body; in a case where a write-back request does not hit in the corresponding far storage body, determining whether to select a third victim cache line in the corresponding far storage body to be written back to the memory according to a replacement algorithm.

For example, according to the index information of the physical address of the write-back request of the near storage body, a far storage body is corresponding, the far storage body is accessed and a search operation is performed through the tag information of the physical address, and the tag of the physical address is compared with the tag of each cache line in the far storage body. If a match is made, the cache hits, indicating that the second victim cache line already exists in the far storage body, so only the state stored in the far storage body needs to be updated. In the case of a miss, a replacement algorithm needs to be used to determine whether a third victim cache line needs to be generated in the corresponding far storage body and written back to the memory.

For example, in some embodiments of the present disclosure, determining whether to select a third sacrifice cache line in the corresponding far storage body and write it back to the memory according to the replacement algorithm may include: if the replacement algorithm shows that there are free cache lines available in the corresponding far storage body, storing the second sacrifice cache line in the corresponding far storage body; if the replacement algorithm shows that there are no free cache lines available in the corresponding far storage body, writing the second sacrifice cache line or the third sacrifice cache line back to the memory.

For example, when there is an idle cache line in the corresponding far memory bank to store the second sacrifice cache line, the second sacrifice cache line is directly set in the far memory bank. When there is no idle cache line in the corresponding far memory bank to store the second sacrifice cache line, a third sacrifice cache line is generated in the far memory bank, and it is determined whether to write the second sacrifice cache line or the third sacrifice cache line into the memory. The following three methods can be used to determine which sacrifice cache line is written into the memory, and the embodiments of the present disclosure are not limited to the following three methods.

Method 1: For the LRU replacement algorithm, the aging value of the second victim cache line is compared with the aging value of the third victim cache line, and the second victim cache line and the third victim cache line with the smaller aging value are preferentially selected. The large cache line is written back to the memory, and if the aging value of the second victim cache line is equal to the aging value of the third victim cache line, the second victim cache line is written back to the memory.

Method 2: Prioritize writing back the second victim cache line or the third victim cache line through register configuration.

Method three: Check the directory of the shared cache, and if the second victim cache line exists in the near memory bank corresponding to the other processor core, write the second victim cache line in the near memory bank corresponding to the other processor core back to the memory.

FIG8A is a schematic flow chart showing a cache management method for a read request.

As shown in FIG8A , first, the processor core issues a read request, and preferentially accesses the corresponding near storage body according to the physical address of the read request, and determines whether the read request hits in the corresponding near storage body. If it hits, the copy in the corresponding near storage body is invalidated, and the data in the near storage body is returned to the processor core, and then the directory of the shared cache is updated. If it does not hit, access_farBank_flag is set to 1, and the read request is sent to the corresponding far storage body. Then it is determined whether the read request hits in the corresponding far storage body. If it hits, the copy in the far storage body is retained, the aging (age) information of the cache line stored in the far storage body is increased, and the data in the corresponding far storage body is returned to the processor core, and then the directory of the shared cache is updated. If it does not hit, it is determined whether the read request hits in other processor cores by checking the directory of the shared cache. If it hits, the data is returned to the processor core that issued the read request through the transfer mode from processor core to processor core, and then the directory in the shared cache is updated. If it does not hit, the read request is sent to the memory (an example of the system memory) to obtain the requested data.

FIG8B shows a schematic flow chart of a cache management method for write-back requests.

As shown in FIG8B , first, the processor core sends a write-back request to the corresponding near storage body to determine whether the write-back request hits in the corresponding near storage body. If it hits, the directory of the shared cache is updated. If it misses, the first sacrifice cache line is set in the corresponding near storage body through the replacement algorithm, and it is determined whether to generate a second sacrifice cache line. If there is an idle cache line in the corresponding near storage body to store the first sacrifice cache line, the second sacrifice cache line is not generated, and the directory of the shared cache is updated. If there is no idle cache line in the corresponding near storage body to store the first sacrifice cache line, a second sacrifice cache line is generated, and then the second sacrifice cache line is migrated to the corresponding far storage body, and the write-back request is routed to the corresponding far storage body. Then it is determined whether the second sacrifice cache line hits in the corresponding far storage body. If it hits, the write-back request is completed. If it misses, it is determined whether a third sacrifice cache line will be generated in the corresponding far storage body according to the replacement algorithm. If in the corresponding far storage body If there is an idle cache line available for use, the third victim cache line is not generated, and it is determined whether the second victim cache line is written back to the memory. If yes, the second victim cache line is written back to the memory, and if no, the second victim cache line is set in the corresponding far memory bank. If there is no idle cache line available for use in the corresponding far memory bank, the third victim cache line is generated, the second victim cache line is set in the corresponding far memory bank, and the third victim cache line is written back to the memory.

FIG. 8C is a schematic block diagram showing an example of cache line migration.

As shown in FIG8C , in the initial state, there are 4 processor cores (Core0, Core1, Core2, Core3), Core3 has a copy D, and the shared cache includes 4 memory banks (memory bank 0, memory bank 1, memory bank 2, memory bank 3), memory bank 0 has a copy A, memory bank 1 has a copy B, memory bank 2 has a copy C, and memory bank 3 has a copy F. Memory bank 0 is the near memory bank of Core0 and the far memory bank of Core1/2/3; memory bank 1 is the near memory bank of Core1 and the far memory bank of Core0/2/3; memory bank 2 is the near memory bank of Core2 and the far memory bank of Core0/1/3; memory bank 3 is the near memory bank of Core3 and the far memory bank of Core0/1/2.

First, Core0 reads data from the corresponding near memory bank (memory bank 0), Core1 reads data from the corresponding far memory bank (memory bank 2), Core2 reads data from other processor cores (Core3), and Core3 reads data from the memory. After the above read operation, copy A in memory bank 0 is migrated to Core0, copy C in memory bank 2 is migrated to Core1 and memory bank 2 retains copy C, copy D in Core3 is migrated to Core2 and Core3 retains copy D, and copy E in memory is migrated to Core3. Then, Core0, Core1, and Core3 generate write-back requests respectively. After the write-back operation, the victim cache line (copy A) in Core0 is written back to memory bank 0. The victim cache line (copy C) in Core1 is written back to memory bank 1, the victim cache line (copy B) in memory bank 1 is migrated to the far memory bank (memory bank 2) corresponding to Core1, and the victim cache line (copy C) in memory bank 2 is written back to the memory. The victim cache line F in the near memory bank (memory bank 3) corresponding to Core3 is migrated to the corresponding far memory bank (memory bank 0), and the victim cache line F is written back to the memory from memory bank 0.

FIG. 9A shows a schematic block diagram of a cache device 900 provided by at least one embodiment of the present disclosure. The cache device can be used to execute the cache management method shown in FIG. 4 .

As shown in FIG9A , the cache device 900 includes a shared cache 901 for multiple processor cores to share and a cache management unit 902. The cache management unit 902 includes a near storage receiving component 903, a far storage receiving component 904, a near storage pipeline control component 905, a far storage pipeline control component 906, The near bank returns the result component 907 and the far bank returns the result component 908. Here, the shared cache 901 includes a plurality of banks.

The cache management unit 902 is configured to allocate a respective near memory bank and far memory bank to each processor core, and to make the memory access request to each processor core access the corresponding near memory bank first, and then access the corresponding far memory bank.

For example, the access latency of the processor core to the corresponding near memory bank is shorter than the access latency to the corresponding far memory bank.

The near storage receiving component 903 is configured to receive a memory access request sent to a corresponding near storage.

The far memory bank receiving component 904 is configured to receive a memory access request sent to a corresponding far memory bank.

The near bank pipeline control component 905 is configured to determine the processing mode of the memory access request received by the corresponding near bank and whether it hits in the corresponding near bank, and execute a replacement algorithm for the corresponding near bank.

The far memory bank pipeline control component 906 is configured to determine the processing mode of the memory access request received by the corresponding far memory bank and whether it hits in the corresponding far memory bank, and execute a replacement algorithm for the corresponding far memory bank.

The near storage return result component 907 is configured to return the result required by the processor core to the processor core.

The far storage return result component 908 is configured to return the result required by the processor core to the processor core.

The cache device 900 has the same technical effects as the cache management method shown in FIG. 4 , which will not be described in detail herein.

FIG. 9B shows a schematic structural diagram of a cache device 910 provided by at least one embodiment of the present disclosure.

As shown in FIG9B , the near storage receiving component 911 receives the memory access request sent by the processor to the near storage, and the near storage receiving component 911 sends the memory access request to the near storage pipeline control component 912. The near storage pipeline control component 912 is connected to the near storage storage component 913, the near storage return result component 914, and the far storage receiving component 915. The near storage result return component 914 is responsible for returning the result to the processor. The far storage receiving component 915 receives the memory access request from the near storage pipeline control component 912, and sends the memory access request to the far storage pipeline control component 916. The far storage pipeline control component 916 is connected to the far storage storage component 917, the far storage return result component 918, and the memory 919 connection. The far storage return result component 918 can read data from the memory 919 and is responsible for returning the result to the processor.

It should be noted that the near memory receiving component and the far memory receiving component can be implemented in hardware such as a queue or a FIFO (First In First Out) queue, and the present disclosure does not limit this. The near memory storage component and the far memory storage component are used to store cache line information, which can be in the form of a static random access memory (SRAM), a dynamic random access memory (DRAM), etc., and the present disclosure does not limit this. The memory can be on-chip storage or off-chip storage, and the present disclosure does not limit this.

For example, the cache device may be implemented using hardware, software, firmware, or any feasible combination thereof, and the present disclosure is not limited thereto.

At least one embodiment of the present disclosure further provides a cache management device, comprising: a memory for non-temporarily storing computer executable instructions; and a processor for executing the computer executable instructions, wherein the computer executable instructions, when executed by the processor, execute the cache management method provided by at least one embodiment of the present disclosure.

FIG10 shows a schematic diagram of a cache management device 1000 according to an embodiment of the present disclosure. As shown in FIG10 , the cache management device 1000 according to an embodiment of the present disclosure may include a processing device 1001 and a memory 1002 , which may be interconnected via a bus 1003 .

The processing device 1001 can perform various actions and processes according to the program or code stored in the memory 1002. Specifically, the processing device 1001 can be an integrated circuit chip with signal processing capabilities. For example, the above-mentioned processing device can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The various methods, steps, processes and logic block diagrams disclosed in the embodiments of the present disclosure can be implemented or executed. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., which can be an X86 architecture or an ARM architecture, etc.

The memory 1002 stores computer executable instructions, wherein the computer executable instructions implement the cache management method provided by at least one embodiment of the present disclosure when executed by the processing device 1001. The memory 1002 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. The non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEROM), or an electrically erasable programmable read-only memory (EEROM). The volatile memory may be a random access memory (RAM) that is used as an external cache. By way of example but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct main memory bus random access memory (DRRAM). It should be noted that the memory of the methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.

At least one embodiment of the present disclosure further provides an electronic device, including a cache and a cache device provided by at least one embodiment of the present disclosure and multiple processor cores. In one embodiment, the electronic device is, for example, a central processing unit, and the processor is, for example, a single-core or multi-core processor. In one embodiment, the electronic device is a computer system, and the computer system includes one or more processors,

Fig. 11 shows a schematic diagram of an electronic device 1100 according to an embodiment of the present disclosure. As shown in Fig. 11 , the electronic device 1100 according to an embodiment of the present disclosure may include a cache device 900 , a cache 1101 , and a plurality of Cores 1102 .

At least one embodiment of the present disclosure provides a computer-readable storage medium for non-transitory storage of computer-executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by a processor.

Similarly, the computer-readable storage medium in the embodiments of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. It should be noted that the memory of the methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.

The embodiment of the present disclosure also provides a computer program product or a computer program, which includes a computer instruction, and the computer instruction is stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the cache management method according to the embodiment of the present disclosure.

The technical effects of the above-mentioned cache device, cache management device, electronic device and storage medium are the same as the technical effects of the cache management method shown in Figure 4, and will not be repeated here.

There are a few points to note:

(1) The drawings of the embodiments of the present disclosure only relate to the structures of the embodiments of the present disclosure. Other structures Please refer to the usual design.

(2) In the absence of conflict, the embodiments of the present disclosure and the features therein may be combined with each other to obtain new embodiments.

The above description is only a specific implementation of the present disclosure, but the protection scope of the present disclosure is not limited thereto. The protection scope of the present disclosure shall be based on the protection scope of the claims.

Claims

A cache management method is used for a shared cache shared by multiple processor cores, the cache management method comprising:

Allocating a respective near memory bank and a far memory bank to each of the processor cores;

For each memory access request of the processor core, the corresponding near memory bank is accessed first, and then the corresponding far memory bank is accessed.
The cache management method according to claim 1, wherein for each of the processor cores, preferentially accessing the corresponding near memory bank and then accessing the corresponding far memory bank comprises:

Performing an operation on the corresponding near memory bank according to the physical address of the read request in the memory access request;

In the case that the read request does not hit the near memory bank, the read request is routed to the corresponding far memory bank according to the physical address of the read request, and the corresponding far memory bank is operated.
The cache management method according to claim 2, further comprising:

In the case that the read request does not hit in both the corresponding near memory bank and the corresponding far memory bank, operations are performed on other processor cores by checking the directory of the shared cache.
The cache management method according to claim 2, wherein the operation on the corresponding near memory bank according to the physical address of the read request in the memory access request comprises:

When the read request hits the corresponding near memory bank, the data in the near memory bank is returned to the processor core.
The cache management method according to claim 2, wherein the operating the corresponding near memory bank according to the physical address of the read request in the memory access request comprises:

In the case where the read request hits the corresponding near storage bank, the copy in the corresponding near storage bank is invalidated, and the directory of the shared cache is updated.
The cache management method according to any one of claims 2, 4, and 5, wherein the operation on the corresponding far memory bank comprises:

When the read request hits the corresponding far memory bank, the data in the corresponding far memory bank is returned to the processor core.
The cache management method according to claim 6, wherein the corresponding far memory bank Operations also include:

The copy in the far storage body is retained, aging information of the cache line corresponding to the data stored in the corresponding far storage body is added, and the directory of the shared cache is updated.
The cache management method according to claim 3, wherein operating other processor cores by viewing the directory of the shared cache comprises:

In the case that the data to be searched exists in the other processor cores, the data in the other processor cores are returned to the processor core that issued the read request, and the directory in the shared cache is updated;

In the case that the data to be found does not exist in the other processor cores, the read request is sent to the memory to obtain the data to be found.
The cache management method according to claim 1, wherein for each of the processor cores, preferentially accessing the corresponding near memory bank and then accessing the corresponding far memory bank comprises:

Performing an operation on the corresponding near memory bank according to the physical address of the write-back request in the memory access request;

In the case where the corresponding near memory bank generates a cache line migration operation, the corresponding far memory bank is operated according to the address information of the first victim cache line of the corresponding near memory bank.
The cache management method according to claim 9, wherein the step of operating the corresponding near memory bank according to the physical address of the write-back request in the memory access request comprises:

When the write-back request hits the corresponding near memory bank, updating the state stored in the corresponding near memory bank;

In the case where the write-back request does not hit in the corresponding near memory bank, the first victim cache line is set in the corresponding near memory bank through a replacement algorithm.
The cache management method according to claim 10, wherein storing the first victim cache line in the corresponding near memory bank by a replacement algorithm comprises:

In the case where there is an idle cache line in the corresponding near storage bank for storing the first victim cache line, storing the first victim cache line in the corresponding near storage bank, and updating the directory of the shared cache;

When there is no free cache line in the corresponding near storage body to store the first victim cache line, the corresponding near storage body generates a second victim cache line, performs a migration operation on the second victim cache line, marks a flag indicating writing back to the far storage body, and stores the second victim cache line in the corresponding near storage body. The line and the aging information corresponding to the second victim cache line are sent to the corresponding far memory bank to update the directory of the shared cache.
The cache management method according to claim 11, wherein the operation on the corresponding far memory bank according to the address information of the first victim cache line of the corresponding near memory bank comprises:

When the write-back request hits in the corresponding far memory bank, updating the state of the cache line in the corresponding far memory bank;

In the case that the write-back request does not hit in the corresponding far memory bank, it is determined according to a replacement algorithm whether to select a third victim cache line in the corresponding far memory bank and write it back to the memory.
The cache management method according to claim 12, wherein determining whether to select a third victim cache line in the corresponding far memory bank and write it back to the memory according to the replacement algorithm comprises:

When the replacement algorithm shows that there is an idle cache line in the corresponding far memory bank, storing the second victim cache line in the corresponding far memory bank;

When the replacement algorithm shows that there is no free cache line available in the corresponding far memory bank, the second victim cache line or the third victim cache line is written back to the memory.
The cache management method according to claim 13, wherein writing the second victim cache line or the third victim cache line back to the memory comprises:

comparing the aging value of the second victim cache line with the aging value of the third victim cache line;

writing the cache line with a larger aging value among the second victim cache line and the third victim cache line back to the memory; or

When the aging value of the second victim cache line is equal to the aging value of the third victim cache line, the second victim cache line is written back to the memory.
The cache management method according to claim 13, wherein writing the second victim cache line or the third victim cache line back to the memory comprises:

The second victim cache line is preferentially written back or the third victim cache line is preferentially written back through register configuration.
The cache management method according to claim 13, wherein writing the second victim cache line or the third victim cache line back to the memory comprises:

Check the directory of the shared cache, and if the second victim cache line exists in the near storage bank corresponding to the other processor core, write the second victim cache line in the near storage bank corresponding to the other processor core back to the memory.
A cache device, comprising:

A shared cache for multiple processor cores to share, the shared cache comprising multiple memory banks,

The cache management unit is configured to allocate a respective near memory bank and far memory bank to each processor core, and to make the memory access request to each processor core access the corresponding near memory bank first, and then access the corresponding far memory bank.
The cache device according to claim 17, wherein the cache management unit comprises:

A near storage receiving component configured to receive a memory access request sent to the corresponding near storage;

A remote storage receiving component configured to receive a memory access request sent to the corresponding remote storage;

A near memory bank pipeline control component configured to determine a processing mode of a memory access request received by the corresponding near memory bank and whether the memory access request is hit in the corresponding near memory bank, and to execute a replacement algorithm for the corresponding near memory bank;

A far memory bank pipeline control component configured to determine a processing mode of a memory access request received by the corresponding far memory bank and whether the memory access request is hit in the corresponding far memory bank, and to execute a replacement algorithm for the corresponding far memory bank;

A near storage body return result component returns the result required by the processor core to the processor core;

The far storage returns the result component, and returns the result required by the processor core to the processor core.
A cache management device, comprising:

Processor; and

a memory storing computer executable instructions,

Wherein, when the computer executable instructions are executed by the processor, the cache management method according to any one of claims 1-16 is implemented.
An electronic device comprises a cache, the cache device as claimed in claim 17, and a plurality of processor cores.
A computer-readable storage medium for non-transitory storage of computer-executable instructions,

Wherein, the computer executable instructions, when executed by the processor, implement the following claims 1-16 The cache management method described in any one of the above.