CN116010109B - Cache resource allocation method and device, electronic equipment and storage medium - Google Patents

Cache resource allocation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116010109B
CN116010109B CN202310153348.3A CN202310153348A CN116010109B CN 116010109 B CN116010109 B CN 116010109B CN 202310153348 A CN202310153348 A CN 202310153348A CN 116010109 B CN116010109 B CN 116010109B
Authority
CN
China
Prior art keywords
shared cache
identification information
data request
cache
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310153348.3A
Other languages
Chinese (zh)
Other versions
CN116010109A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202311076545.6A priority Critical patent/CN117093371A/en
Priority to CN202310153348.3A priority patent/CN116010109B/en
Publication of CN116010109A publication Critical patent/CN116010109A/en
Application granted granted Critical
Publication of CN116010109B publication Critical patent/CN116010109B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure relates to the technical field of electric digital data processing, and in particular relates to a cache resource allocation method, a cache resource allocation device, electronic equipment and a storage medium. A processor system comprising at least two levels of cache, the highest level of the at least two levels of cache being a shared cache, the shared cache comprising a plurality of shared cache sets, the method comprising: responding to a first data request from any application, and acquiring first identification information carried by the first data request; and responding to the first data request as the data request corresponding to the first identification information received for the first time, and distributing a preset number of shared cache groups to the first identification information. Because the number of the groups is more than the number of the ways, the shared cache resources are allocated based on the groups, so that the reasonable allocation of the shared cache resources can be realized more easily, and the utilization rate of the shared cache resources can be improved.

Description

Cache resource allocation method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of electric digital data processing, and in particular relates to a cache resource allocation method, a cache resource allocation device, electronic equipment and a storage medium.
Background
The cache is an on-chip memory located between the CPU (Central Processing Unit )/GPU (Graphics Processing Unit, graphics processor) and the memory, which can provide fast and small-capacity data reading and writing. The data in the cache is a part of the main memory, is stored in the cache through a certain mapping relation, and is obtained by comparing the tag information. The mapping relation is mainly divided into the following three types: the first is direct mapping. In direct mapping, a block address in memory can only be mapped to a fixed location in the cache. The second is a full-join map. In full-associative mapping, a block of addresses in memory may be mapped to any location in the cache. The third is a group join map. Group-connected mapping is a compromise between direct mapping and full-connected mapping. In the set associative mapping, the cache is divided into sets (sets) each having multiple ways. A block address in memory can only be mapped to a fixed set, but can be mapped to a different way within each set. Because of the complexity of physical implementation, the number of ways that each group contains in a group connection typically does not exceed 16 or 32. Table 1 shows an exemplary implementation of the group join map. In the implementation shown in Table 1, the caches are divided into M+1 sets, each set including N+1 ways.
TABLE 1
Way #0 Way #1 Way #N
Group #
0 Data Data Data Data
Data Data Data Data
Group #M Data Data Data Data
Caching utilizes the principle of locality of programs, and is divided into temporal locality and spatial locality. Temporal locality refers to the fact that an address may be repeatedly accessed over a period of time. Spatial locality means that an address is accessed, and there is a high likelihood that a nearby address will be accessed. Since the locality exhibited by different programs will also vary, there will be a difference in cache utilization.
How to improve the utilization rate of the cache resources is a technical problem to be solved urgently.
Disclosure of Invention
The present disclosure provides a cache resource allocation technical scheme.
According to an aspect of the present disclosure, there is provided a method for allocating cache resources, a processor system including at least two levels of caches, a highest level of the at least two levels of caches being a shared cache, the shared cache including a plurality of shared cache groups, the method including:
responding to a first data request from any application, and acquiring first identification information carried by the first data request;
and responding to the first data request as the data request corresponding to the first identification information received for the first time, and distributing a preset number of shared cache groups to the first identification information.
In one possible implementation, the shared cache includes a plurality of shared cache channels;
the responding to the first data request is a data request corresponding to the first identification information received for the first time, allocates a preset number of shared cache groups to the first identification information, and includes:
and responding to the first data request as the data request corresponding to the first identification information received for the first time, and respectively distributing the preset number of shared cache groups in the plurality of shared cache channels to the first identification information.
In one possible implementation, the preset number includes a first preset number and a second preset number, and the first preset number is smaller than the second preset number;
the responding to the first data request is a data request corresponding to the first identification information received for the first time, and respectively distributing a preset number of shared cache groups in the plurality of shared cache channels to the first identification information, including:
responding to the first data request as the first received data request corresponding to the first identification information, and determining a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels, wherein the first reference shared cache channel represents the reference shared cache channel corresponding to the first identification information;
And allocating the first preset number of shared cache groups in a first common shared cache channel to the first identification information, and allocating the second preset number of shared cache groups in the first reference shared cache channel to the first identification information, wherein the first common shared cache channel represents a shared cache channel except the first reference shared cache channel in the plurality of shared cache channels.
In one possible implementation manner, the determining, in response to the first data request being a first received data request corresponding to the first identification information, a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels includes:
and responding to the first data request as the first received data request corresponding to the first identification information, wherein the shared cache channels which are not determined as the reference shared cache channels exist in the plurality of shared cache channels, and determining the first reference shared cache channel corresponding to the first identification information from the shared cache channels which are not determined as the reference shared cache channels.
In one possible implementation, the method further includes:
Acquiring a first hit rate of the first reference shared cache channel for the first identification information and a second hit rate of the first common shared cache channel for the first identification information;
and adjusting the number of the shared cache groups allocated to the first identification information according to the first hit rate and the second hit rate.
In one possible implementation manner, the adjusting the number of shared cache sets allocated to the first identification information according to the first hit rate and the second hit rate includes:
determining a ratio of the first hit rate to the second hit rate;
responsive to the ratio being greater than or equal to a first preset threshold, increasing a number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; or, in response to the ratio being less than or equal to a second preset threshold, reducing the number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; the first preset threshold is greater than the second preset threshold, and the first preset threshold and the second preset threshold are both greater than 1.
In one possible implementation, the method further includes:
Acquiring a first request address corresponding to the first data request;
responding to the first request address to determine that cache miss occurs in a local cache, and acquiring a group mask, a group offset and a flag bit offset corresponding to the first identification information;
determining new group bits and new tag information corresponding to the first data request according to a second request address corresponding to the first data request, the group mask, the group offset and the flag bit offset, wherein the second request address is determined according to the first request address;
and searching target data according to the channel information and the in-row offset address in the second request address, the new group bit and the new tag information.
In one possible implementation, the method further includes:
and remapping the first request address to obtain the second request address.
In one possible implementation manner, the determining new set of bits and new tag information corresponding to the first data request according to the second request address, the set mask, the set offset, and the flag bit offset corresponding to the first data request includes:
acquiring original group bits and original tag information from the second request address;
Performing AND operation on the original group bit and the group mask to obtain the relative positions of the target data requested by the first data request in a plurality of shared cache groups corresponding to the first identification information;
determining a new group bit corresponding to the first data request according to the group offset and the relative position;
and determining new tag information corresponding to the first data request according to the original tag information, the flag bit offset and the designated bit in the original group bit.
In one possible implementation manner, the searching the target data according to the channel information and the intra-row offset address in the second request address, and the new set of bits and the new tag information includes:
and responding to the fact that target data cannot be found according to the channel information, the new group bits, the new tag information and the in-row offset address, acquiring the target data from a memory or an external memory, writing the target data into a shared cache group corresponding to the new group bits, and requesting to return the target data to the first data.
In one possible implementation manner, the first identification information includes any one of the following:
The method comprises the steps of determining identification information according to a module called by the application, determining identification information according to context identification information, and determining identification information according to an address interval of target data requested by the first data request in a memory.
According to an aspect of the present disclosure, there is provided a cache resource allocation apparatus, a processor system including at least two levels of caches, a highest level of the at least two levels of caches being a shared cache, the shared cache including a plurality of shared cache groups, the apparatus comprising: the first acquisition module is used for responding to a first data request from any application and acquiring first identification information carried by the first data request; the allocation module is used for allocating a preset number of shared cache groups to the first identification information in response to the first data request being the data request corresponding to the first identification information received for the first time.
In one possible implementation, the shared cache includes a plurality of shared cache channels; the distribution module is used for: and responding to the first data request as the data request corresponding to the first identification information received for the first time, and respectively distributing the preset number of shared cache groups in the plurality of shared cache channels to the first identification information.
In one possible implementation, the preset number includes a first preset number and a second preset number, and the first preset number is smaller than the second preset number; the distribution module is used for: responding to the first data request as the first received data request corresponding to the first identification information, and determining a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels, wherein the first reference shared cache channel represents the reference shared cache channel corresponding to the first identification information; and allocating the first preset number of shared cache groups in a first common shared cache channel to the first identification information, and allocating the second preset number of shared cache groups in the first reference shared cache channel to the first identification information, wherein the first common shared cache channel represents a shared cache channel except the first reference shared cache channel in the plurality of shared cache channels.
In one possible implementation, the allocation module is configured to: and responding to the first data request as the first received data request corresponding to the first identification information, wherein the shared cache channels which are not determined as the reference shared cache channels exist in the plurality of shared cache channels, and determining the first reference shared cache channel corresponding to the first identification information from the shared cache channels which are not determined as the reference shared cache channels.
In one possible implementation, the apparatus further includes: the second acquisition module is used for acquiring a first hit rate of the first reference shared cache channel aiming at the first identification information and a second hit rate of the first common shared cache channel aiming at the first identification information; and the adjusting module is used for adjusting the number of the shared cache groups allocated to the first identification information according to the first hit rate and the second hit rate.
In one possible implementation, the adjusting module is configured to: determining a ratio of the first hit rate to the second hit rate; responsive to the ratio being greater than or equal to a first preset threshold, increasing a number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; or, in response to the ratio being less than or equal to a second preset threshold, reducing the number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; the first preset threshold is greater than the second preset threshold, and the first preset threshold and the second preset threshold are both greater than 1.
In one possible implementation, the apparatus further includes: the third acquisition module is used for acquiring a first request address corresponding to the first data request; a fourth obtaining module, configured to obtain a group mask, a group offset, and a flag bit offset corresponding to the first identification information in response to determining that a cache miss occurs in the local cache according to the first request address; a determining module, configured to determine new group bits and new tag information corresponding to the first data request according to a second request address corresponding to the first data request, the group mask, the group offset, and the flag bit offset, where the second request address is determined according to the first request address; and the searching module is used for searching the target data according to the channel information and the in-row offset address in the second request address, the new group bit and the new tag information.
In one possible implementation, the apparatus further includes: and the remapping module is used for remapping the first request address to obtain the second request address.
In one possible implementation, the determining module is configured to: acquiring original group bits and original tag information from the second request address; performing AND operation on the original group bit and the group mask to obtain the relative positions of the target data requested by the first data request in a plurality of shared cache groups corresponding to the first identification information; determining a new group bit corresponding to the first data request according to the group offset and the relative position; and determining new tag information corresponding to the first data request according to the original tag information, the flag bit offset and the designated bit in the original group bit.
In one possible implementation, the search module is configured to: and responding to the fact that target data cannot be found according to the channel information, the new group bits, the new tag information and the in-row offset address, acquiring the target data from a memory or an external memory, writing the target data into a shared cache group corresponding to the new group bits, and requesting to return the target data to the first data.
In one possible implementation manner, the first identification information includes any one of the following: the method comprises the steps of determining identification information according to a module called by the application, determining identification information according to context identification information, and determining identification information according to an address interval of target data requested by the first data request in a memory.
According to an aspect of the present disclosure, there is provided an electronic apparatus including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the executable instructions stored by the memory to perform the above-described method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
According to an aspect of the present disclosure, there is provided a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in an electronic device, a processor in the electronic device performs the above method.
In the embodiment of the disclosure, the processor system includes at least two levels of caches, the highest level in the at least two levels of caches is a shared cache, the shared cache includes a plurality of shared cache groups, and by responding to a first data request from any application, first identification information carried by the first data request is obtained, and responding to the first data request as a data request corresponding to the first identification information received for the first time, a preset number of shared cache groups are allocated to the first identification information, so that allocation of shared cache resources is performed based on the groups. Since the number of groups is greater than the number of ways (e.g., the shared cache includes a plurality of shared cache ways, each shared cache way includes 256 groups, and the number of ways included in a group typically does not exceed 16 or 32), the allocation of shared cache resources based on the groups can be more easily implemented, and the utilization of shared cache resources can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.
FIG. 1 shows a schematic diagram of the cache structure of a GPU.
Fig. 2 shows a flowchart of a method for allocating cache resources according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram illustrating allocation of identification information for a data request by groups in a cache resource allocation method according to an embodiment of the present disclosure.
Fig. 4 is a schematic diagram of a cache structure of a GPU according to an embodiment of the present disclosure.
Fig. 5 is a schematic diagram of a cache lookup method according to an embodiment of the disclosure.
Fig. 6 shows a block diagram of a cache resource allocation apparatus provided by an embodiment of the present disclosure.
Fig. 7 illustrates a block diagram of an electronic device 1900 provided by an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
Fig. 1 shows a schematic diagram of the cache structure of a GPU (Graphics Processing Unit, graphics processor). At least two levels of cache structures are typically present in GPU systems. In the example shown in fig. 1, the GPU system includes two levels of caches.
There are multiple clusters of arithmetic units (GPU clusters) on the GPU, and there are multiple arithmetic units inside each cluster of arithmetic units. The Local Cache (Local Cache) may be considered a first level Cache, which is only accessed by the corresponding computing unit cluster. In fig. 1, k+1 arithmetic unit clusters (arithmetic unit cluster 0 to arithmetic unit cluster K) and k+1 local caches (local cache 0 to local cache K) corresponding to the k+1 arithmetic unit clusters one by one are included. The local cache may also be referred to as a local cache, a level 1 cache, an L1 cache, etc., which is not limited herein.
A communication module (interface) may be used to pass the request to the corresponding external memory block.
The Shared Cache (Shared Cache) may be considered a second level Cache. The shared cache may be accessed by all the clusters of arithmetic units. The shared cache may also be referred to as a level 2 cache, an L2 cache, a global cache, etc., which is not limited herein. In fig. 1, l+1 shared cache ways are included, namely shared cache way 0 to shared cache way L.
In addition, in fig. 1, l+1 DRAM banks (Dynamic Random Access Memory bank, dynamic random access memory banks) are included, namely, DRAM bank 0 to DRAM bank L, respectively.
Wherein, the relationship between the shared cache channel and the DRAM memory bank can be one-to-one or many-to-one.
Because the GPU can process many operations or applications in parallel, the data locality of these operations and applications is not necessarily the same, so the requirements on the cache are different, even there is no data sharing between applications, and then the operations are all put in the shared cache, which will affect the operation efficiency.
Because the shared cache may be commonly used by multiple threads, multiple cores or multiple different applications, and the cache size or data characteristics required by each application are different, it is preferable to treat the shared cache differently, avoid collisions, and reasonably allocate cache resources. The related art adopts a way division (way partition) manner to allocate different ways to different applications. Table 2 shows another exemplary implementation of the group join map. In the implementation shown in Table 2, the caches are divided into M+1 sets, each set comprising 4 ways.
TABLE 2
Way #0 Way #1 Way #2 Way #3
Group #0 Data Data Data Data
Data Data Data Data
Group #M Data Data Data Data
In one example of a split by way of number, way 0 and way 1 store only application A data, way 2 store only application B data, and way 3 store only application C data. So that the different applications do not affect each other.
In practical physical implementations of this way of dividing by way of number of ways, the number of ways in each group is limited (the number of ways each group contains typically does not exceed 16 or 32), and when the number of requesters is excessive, exceeding the number of ways (as is common in GPU applications), the way-by-way allocation creates limitations, resulting in unreasonable allocation of resources.
The embodiment of the disclosure provides a method for allocating cache resources, a processor system comprises at least two levels of caches, the highest level in the at least two levels of caches is a shared cache, the shared cache comprises a plurality of shared cache groups, first identification information carried by a first data request is obtained by responding to the first data request from any application, and a preset number of shared cache groups are allocated to the first identification information in response to the first data request, wherein the first identification information corresponds to the first identification information received for the first time. Since the number of groups is greater than the number of ways (e.g., the shared cache includes a plurality of shared cache ways, each shared cache way includes 256 groups, and the number of ways included in a group typically does not exceed 16 or 32), the allocation of shared cache resources based on the groups can be more easily implemented, and the utilization of shared cache resources can be improved.
The cache resource allocation method provided by the embodiment of the present disclosure is described in detail below with reference to the accompanying drawings.
Fig. 2 shows a flowchart of a method for allocating cache resources according to an embodiment of the present disclosure. The Cache resource allocation method is used for allocating the resources of the Cache (Cache). In one possible implementation manner, the execution subject of the buffer resource allocation method may be a buffer resource allocation apparatus, and for example, the buffer resource allocation method may be executed by a terminal device or a server or other electronic devices. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or the like. In some possible implementations, the method of allocating cache resources may be implemented by a processor invoking computer readable instructions stored in a memory. As shown in fig. 2, the method for allocating cache resources includes steps S21 to S22.
In step S21, in response to a first data request from any application, first identification information carried by the first data request is acquired.
In step S22, a preset number of shared cache groups are allocated to the first identification information in response to the first data request being a data request corresponding to the first identification information received for the first time.
In an embodiment of the disclosure, a processor system includes at least two levels of caches, a highest level of the at least two levels of caches being a shared cache, the shared cache including a plurality of shared cache sets.
The processor system may be a GPU (Graphics Processing Unit, graphics processor) system or a CPU (Central Processing Unit ) system, which is not limited herein. Hereinafter, the processor system is exemplified as a GPU system.
In one possible implementation, the processor system may include a two-level cache. The first-level cache may be a local cache, and the second-level cache may be a shared cache.
In another possible implementation, the processor system may include a three-level cache. The first-level cache and the second-level cache may be local caches, and the third-level cache may be a shared cache.
In the disclosed embodiments, the shared cache may include at least one channel, i.e., the shared cache may include at least one shared cache channel. Wherein the shared cache channel represents a channel of the shared cache. Any shared cache channel can be accessed by different clusters of arithmetic units.
In one possible implementation, the shared cache may include a plurality of shared cache channels. For example, the number of shared cache ways may be 16, 24, 32, 48, etc., without limitation. In this implementation, each shared cache channel may comprise a plurality of sets (sets), respectively, i.e. each shared cache channel may comprise a plurality of shared cache sets, respectively. For example, each shared cache way may include 256 shared cache sets, respectively.
In the disclosed embodiments, each shared cache set may include multiple ways, respectively. For example, the number of ways in each shared cache set may be 4, 8, 16, 32, etc.
In the embodiment of the disclosure, the first data request may be any data request issued by any application. The first identification information may identify identification information carried by the first data request.
Any one application may issue a large number of data requests. The identification information carried by different data requests sent by the same application can be different or the same. The identification information carried by different data requests sent by different applications can be different or the same.
In one possible implementation manner, the first identification information includes any one of the following: the method comprises the steps of determining identification information according to a module called by the application, determining identification information according to context identification information, and determining identification information according to an address interval of target data requested by the first data request in a memory.
As one example of this implementation, the identification information of the data request may be determined from a module invoked by the application. Taking GPU as an example, the GPU module for application call may include a special purpose unit for processing coordinate transformation in GPU, a special purpose unit for performing texture compression in GPU, and the like, which is not limited herein. In this example, when the same application invokes the same module to issue two data requests, the identification information of the two data requests is the same; when the same application calls two different modules to send two data requests, the identification information of the two data requests is different; when two applications call the same module to respectively send out data requests, the identification information of the two data requests is the same; when two applications call two different modules to respectively send data requests, the identification information of the two data requests is different. For example, the first data request sent by the application A1 calling module M1 is identical to the identification information of the second data request sent by the application A1 calling module M1, the first data request sent by the application A1 calling module M1 is different from the identification information of the third data request sent by the application A1 calling module M2, the first data request sent by the application A1 calling module M1 is identical to the identification information of the fourth data request sent by the application A2 calling module M1, and the first data request sent by the application A1 calling module M1 is different from the identification information of the fifth data request sent by the application A2 calling module M2.
As another example of this implementation, the identification information of the data request may be determined from the identification information of the context. In this example, the identification information of the context may refer to the identification information of the application. In this example, the identification information of different data requests issued by the same application is the same, and the identification information of the data requests issued by different applications is different.
As another example of this implementation, the identification information of the data request may be determined according to an address interval of the target data in the memory, which is requested by the data request. Different applications may access the same segment of address in memory, and thus, the identification information of data requests issued by different applications may be the same. The same application may access different addresses in memory, and thus the identification information of different data requests issued by the same application may be different.
In the implementation manner, the first identification information is determined according to the module called by the application, or the first identification information is determined according to the identification information of the context, or the first identification information is determined according to the address interval of the target data requested by the first data request in the memory, so that the identification information of the data request can be reasonably determined, and more reasonable allocation of the shared cache resources is facilitated.
Although the above implementations describe the manner in which the identification information of a data request is determined as above, those skilled in the art will appreciate that the present disclosure should not be limited thereto. The determination mode of the identification information of the data request can be flexibly determined by a person skilled in the art according to the actual application scene requirement and/or personal preference.
In one possible implementation, the shared cache includes a plurality of shared cache channels; the responding to the first data request is a data request corresponding to the first identification information received for the first time, allocates a preset number of shared cache groups to the first identification information, and includes: and responding to the first data request as the data request corresponding to the first identification information received for the first time, and respectively distributing the preset number of shared cache groups in the plurality of shared cache channels to the first identification information.
For example, the shared cache includes 16 shared cache channels, namely, shared cache channel 0 to shared cache channel 15, and then a preset number of shared cache groups in the 16 shared cache channels may be respectively allocated to the first identification information in response to the first data request being a data request corresponding to the first identification information received for the first time.
In this implementation manner, the first identification information is respectively allocated to a preset number of shared cache groups in the plurality of shared cache channels by responding to the first data request as the first received data request corresponding to the first identification information, so that it is beneficial to balance the requests obtained by each shared cache channel.
As one example of this implementation, the preset number includes a first preset number and a second preset number, and the first preset number is smaller than the second preset number; the responding to the first data request is a data request corresponding to the first identification information received for the first time, and respectively distributing a preset number of shared cache groups in the plurality of shared cache channels to the first identification information, including: responding to the first data request as the first received data request corresponding to the first identification information, and determining a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels, wherein the first reference shared cache channel represents a reference shared cache channel (reference cache) corresponding to the first identification information; and allocating the first preset number of shared cache groups in a first common shared cache channel to the first identification information, and allocating the second preset number of shared cache groups in the first reference shared cache channel to the first identification information, wherein the first common shared cache channel represents a shared cache channel except the first reference shared cache channel in the plurality of shared cache channels.
In this example, the second preset number may be 2 times, 1.5 times, 3 times, etc. the first preset number, without limitation. The first preset number may represent a preset number corresponding to a common shared cache channel, and the second preset number may represent a preset number corresponding to a reference shared cache channel. The first reference shared cache channel may represent a reference shared cache channel corresponding to the first identification information. The reference shared cache channels corresponding to different identification information may be different. The number of reference shared buffer channels corresponding to any one of the identification information may be one or more than two. For example, the number of reference shared cache channels corresponding to any one identification information may be one. For any one of the identification information, the common shared cache channel may represent a shared cache channel other than the reference shared cache channel to which the identification information corresponds. For example, the number of shared cache channels is 16, and the number of normal shared cache channels is 15 with reference to the number of shared cache channels being 1.
In one example, the first predetermined number is 16 and the second predetermined number is 32. In this example, each first normal shared buffer channel may allocate 16 shared buffer groups for the first identification information, and the first reference shared buffer channel may allocate 32 shared buffer groups for the first identification information, respectively.
In this example, the first reference shared cache channel corresponding to the first identification information is determined from the plurality of shared cache channels in response to the first data request being the first received data request corresponding to the first identification information, the first preset number of shared cache groups in the first common shared cache channel are allocated to the first identification information, and the second preset number of shared cache groups in the first reference shared cache channel are allocated to the first identification information, so that the size condition of shared cache resources required by the data request corresponding to the first identification information can be determined based on the reference shared cache channel, thereby being beneficial to improving the utilization rate of shared caches.
In one example, the determining, in response to the first data request being a first received data request corresponding to the first identification information, a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels includes: and responding to the first data request as the first received data request corresponding to the first identification information, wherein the shared cache channels which are not determined as the reference shared cache channels exist in the plurality of shared cache channels, and determining the first reference shared cache channel corresponding to the first identification information from the shared cache channels which are not determined as the reference shared cache channels.
In this example, the reference shared cache channel should be chosen as evenly as possible for different identification information. For example, a total of 4 shared cache ways and identification information for 3 data requests. Wherein the 4 shared cache channels are respectively shared cache channel 0, shared cache channel 1, shared cache channel 2 and shared cache channel 3, and the identification information of the 3 data requests is respectively first identification information (ID 0), second identification information (ID 1) and third identification information (ID 2). For example, after selecting the shared cache way 2 as the reference shared cache way for the first identification information, the shared cache way 2 may be avoided, e.g. the shared cache way 3 may be selected as the reference shared cache way, when selecting the reference shared cache way for the second identification information. When selecting the reference shared cache way for the third identification information, the shared cache way 2 and the shared cache way 3 may be avoided, for example, the shared cache way 0 or the shared cache way 1 is selected as the reference shared cache way.
In this example, the first reference shared cache channel corresponding to the first identification information is determined from the shared cache channels which are not determined as reference shared cache channels by responding to the first data request as the first received data request corresponding to the first identification information and the shared cache channels which are not determined as reference shared cache channels exist in the plurality of shared cache channels, so that the utilization rate of shared cache resources can be improved.
In another example, the determining, in response to the first data request being a first received data request corresponding to the first identification information, a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels includes: and responding to the first data request as the first received data request corresponding to the first identification information, and determining the shared cache channel with the largest residual capacity in the plurality of shared cache channels as a first reference shared cache channel corresponding to the first identification information.
In another example, the determining, in response to the first data request being a first received data request corresponding to the first identification information, a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels includes: and responding to the first data request as the first received data request corresponding to the first identification information, and randomly selecting a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels.
In another example, the reference shared cache channel may be selected sequentially for different identification information. For example, the first time shared cache way 0 is selected as the reference shared cache way, the second time shared cache way 1 is selected as the reference shared cache way, the third time shared cache way 2 is selected as the reference shared cache way, and so on.
In one example, the method further comprises: acquiring a first hit rate of the first reference shared cache channel for the first identification information and a second hit rate of the first common shared cache channel for the first identification information; and adjusting the number of the shared cache groups allocated to the first identification information according to the first hit rate and the second hit rate.
In this example, the first hit rate and the second hit rate may be counted at a preset frequency, so that the shared cache resources allocated to the respective identification information may be adjusted at the preset frequency.
In this example, in the case where the number of normal shared cache channels is plural, the average or median of the hit rates of the respective normal shared cache channels for the first identification information may be determined as the second hit rate.
In this example, by acquiring the first hit rate of the first reference shared cache channel for the first identification information and the second hit rate of the first normal shared cache channel for the first identification information, and adjusting the number of shared cache groups allocated to the first identification information according to the first hit rate and the second hit rate, the shared cache resources allocated to the first identification information are dynamically adjusted based on the performance difference of the first reference shared cache channel and the first normal shared cache channel, so that the utilization rate of the shared cache resources can be further improved, and the running efficiency of different applications can be improved.
In one example, the adjusting the number of shared cache sets allocated to the first identification information according to the first hit rate and the second hit rate includes: determining a ratio of the first hit rate to the second hit rate; responsive to the ratio being greater than or equal to a first preset threshold, increasing a number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; or, in response to the ratio being less than or equal to a second preset threshold, reducing the number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; the first preset threshold is greater than the second preset threshold, and the first preset threshold and the second preset threshold are both greater than 1.
The first preset threshold value and the second preset threshold value can be configured through a register. The second preset threshold may be slightly greater than 1.
Because the number of the shared cache groups obtained by allocating the first identification information in the first reference shared cache channel is greater than the number of the shared cache groups obtained by allocating the first identification information in the first common shared cache channel, if more shared cache resources bring about a significantly higher hit rate (for example, the ratio is greater than or equal to a first preset threshold), the increase of the cache resources can be considered to have a significant benefit for improving the hit rate, and further the increase of the cache space for the first identification information can be considered. If more shared cache resources do not result in a significant increase in hit rate (e.g., the ratio is less than or equal to the second preset threshold), then the increase in cache resources may be considered to be detrimental to increasing hit rate, and thus the shared cache resources allocated to the first identification information may be reduced.
In the above example, in the case where the ratio is smaller than the first preset threshold and larger than the second preset threshold, the number of shared cache sets corresponding to the first identification information may not be changed.
In addition, in order to maintain data consistency when allocation of cache resources is changed, additional cache maintenance operations may be employed, such as a flush operation (flush operation), a purge operation (invalidate), and the like.
In the above example, the number of the shared cache groups allocated to the first identification information in the plurality of shared cache channels is increased by determining a ratio of the first hit rate to the second hit rate, in response to the ratio being greater than or equal to a first preset threshold, or the number of the shared cache groups allocated to the first identification information in the plurality of shared cache channels is decreased in response to the ratio being less than or equal to a second preset threshold, wherein the first preset threshold is greater than the second preset threshold, and both the first preset threshold and the second preset threshold are greater than 1, whereby the utilization rate of the shared cache resources can be further improved.
In one possible implementation, the method further includes: acquiring a first request address corresponding to the first data request; responding to the first request address to determine that cache miss occurs in a local cache, and acquiring a group mask, a group offset and a flag bit offset corresponding to the first identification information; determining new group bits and new tag information corresponding to the first data request according to a second request address corresponding to the first data request, the group mask, the group offset and the flag bit offset, wherein the second request address is determined according to the first request address; and searching target data according to the channel information and the in-row offset address in the second request address, the new group bit and the new tag information.
In this implementation, the first request address may represent a request address carried by the first data request. The first request address may be a virtual address or a physical address.
As an example of this implementation, a set mask (set mask), a set offset (set offset), and a tag bit offset (tag shift) corresponding to the first identification information may be acquired from the ID-cache set mapping table. Wherein the group mask may be used to determine a number of shared cache groups allocated to the first identification information. For example, the set mask=0x0f may indicate that the number of shared cache sets allocated to the first identification information is 16. The group offset may represent a starting position of the shared cache group allocated to the first identification information. For example, the set offset=0x10 may indicate that the start position of the shared cache set of the first identification information is the 17 th set in the shared cache channel. The flag bit offset may be used to determine the shift amount of the tag.
In this implementation, the new set of bits represents the set of bits used to store the target data requested by the first data request. The new tag information may represent new tag information corresponding to the target data. The new tag information can be stored in the shared cache group corresponding to the target data for subsequent cache searching and hit judgment.
In this implementation, the sizes of the group masks corresponding to different identification information may be the same or different. For example, the group mask corresponding to each identification information is 0x0f. As another example, a data request may carry a request size of a shared cache group, and the number of shared cache groups allocated to identification information of the data request may be determined based on the request size.
In this implementation manner, by acquiring the first request address corresponding to the first data request, in response to determining that a cache miss occurs in the local cache according to the first request address, acquiring a set mask, a set offset and a flag bit offset corresponding to the first identification information, and determining a new set bit and new tag information corresponding to the first data request according to the second request address, the set mask, the set offset and the flag bit offset corresponding to the first data request, where the second request address is determined according to the first request address, and according to channel information and an intra-line offset address in the second request address, and the new set bit and the new tag information, target data is searched, so that allocation of a shared cache set based on identification information of the data request can be achieved.
As an example of this implementation, the method further comprises: and remapping the first request address to obtain the second request address.
In one example, addresses issued by the GPU (i.e., request addresses carried by data requests) may be scrambled to be equally distributed to different shared cache channels by address interleaving and hashing.
In this example, the second request address is obtained by remapping the first request address, so that the requests obtained by each shared cache channel can be balanced, and the utilization rate of the shared cache resource is improved.
As another example of this implementation, the first request address is a virtual address; the method further comprises the steps of: and converting the virtual address to the physical address through the memory management unit to obtain a second request address. In this example, the second request address is a physical address.
As another example of this implementation, the first request address is a physical address, and the first request address may be directly taken as the second request address.
As an example of this implementation, the determining new set of bits and new tag information corresponding to the first data request according to the second request address, the set mask, the set offset, and the flag bit offset corresponding to the first data request includes: acquiring original group bits and original tag information from the second request address; performing AND operation on the original group bit and the group mask to obtain the relative positions of the target data requested by the first data request in a plurality of shared cache groups corresponding to the first identification information; determining a new group bit corresponding to the first data request according to the group offset and the relative position; and determining new tag information corresponding to the first data request according to the original tag information, the flag bit offset and the designated bit in the original group bit.
For example, the second request address includes 32 bits, represented in 16 bins as 0x12345678. Wherein the upper 4 bits (0 x 1) are channel information; bits 16-27 (0 x 234) are the original tag information; bits 8-15 (0 x 56) are the original bits; the lower 8 bits (0 x 78) are the in-line offset address used to determine that it accesses the 0x78 byte data in the cache line.
For example, the group mask mask=0x0f, the group offset offset=0x10, and the flag bit offset tag shift=0x4.
And performing an AND operation on the original group bit 0x56 and the group mask 0x0f to obtain the relative position 0x06 of the target data requested by the first data request in the plurality of shared cache groups corresponding to the first identification information. Wherein the original group bits and the group mask are converted into binary values, respectively, resulting in 0000010100000110 and 0000000011111111. 0000010100000110& 0000000011111111= 0000000000000110, thereby determining that the relative position of the target data requested by the first data request in the plurality of shared cache sets corresponding to the first identification information is 0x06.
Adding the group offset to the relative position may result in a new group bit 0x16 corresponding to the first data request. That is, according to the group mask, the 8 th to 11 th bits in the second request address are selected as part of the new group bits, and an offset of 0x10 is added.
From the original tag information 0x234, the flag bit offset 0x4, and 0x5 in the original set of bits, new tag information 0x2345 can be obtained.
Thus, a new request address of 0x123451678 can be obtained. In this example, the 8 th to 11 th bits of the new request address are determined according to the second request address, and 16 total shared cache groups are allocated to the first identification information.
In this example, the original group bit and the original tag information are obtained from the second request address, and the original group bit and the group mask are subjected to an and operation to obtain the relative positions of the target data requested by the first data request in the plurality of shared cache groups corresponding to the first identification information, the new group bit corresponding to the first data request is determined according to the group offset and the relative positions, and the new tag information corresponding to the first data request is determined according to the original tag information, the flag bit offset and the designated bit in the original group bit, so that the allocation of the shared cache groups based on the identification information of the data request can be realized, and the designated bit in the original group bit is reserved when the new tag information is determined, so that the integrity of the original address can be maintained.
Fig. 3 is a schematic diagram illustrating allocation of identification information for a data request by groups in a cache resource allocation method according to an embodiment of the present disclosure. In fig. 3, the request ID indicates identification information carried by the data request, for example, first identification information carried by the first data request. The group mask, the group offset, and the flag bit offset corresponding to the first identification information may be obtained from the ID-cache group mapping table. The ID-cache set mapping table shown in fig. 3 includes mapping relations between IDs 0 to IDN and set mask, set offset, and flag bit offset. The original tag information, the original set of bits, and the intra-row offset address may be obtained from a request address (e.g., a second request address). The original group bits and the group mask may be bitwise and operated to obtain the relative positions of the target data requested by the first data request in the plurality of shared cache groups corresponding to the first identification information. The group offset may be added to the relative position to obtain a new group bit. The original tag information, the flag bit offset and the designated bit in the original group bit can be processed through a shifter to obtain new tag information. And carrying out cache searching according to the channel information, the in-line offset address, the new group bit and the new tag information.
It should be noted that the above definition of group mask and group offset is only an example, and not a unique definition. For example, the same group allocation function may be implemented by means of new group bit=original group bit+group offset & group mask, etc.
As an example of this implementation, the searching for the target data according to the channel information and the intra-row offset address in the second request address, and the new set of bits and the new tag information includes: and responding to the fact that target data cannot be found according to the channel information, the new group bits, the new tag information and the in-row offset address, acquiring the target data from a memory or an external memory, writing the target data into a shared cache group corresponding to the new group bits, and requesting to return the target data to the first data.
In this example, in the case where the first data request is a data request corresponding to the first identification information received for the first time, the target data is not found in the shared cache channel. At this time, the target data may be obtained from the memory or the external memory, and the target data is written into the shared cache group corresponding to the new group bit, and returned to the first data request.
In this example, in response to the channel information, the new set bit, the new tag information, and the intra-row offset address not finding the target data, the target data is obtained from the memory or the external memory, the target data is written into the shared cache group corresponding to the new set bit, and the target data is returned to the first data request, thereby writing the target data into the shared cache group corresponding to the first identification information can be achieved.
Fig. 4 is a schematic diagram of a cache structure of a GPU according to an embodiment of the present disclosure. The parts overlapping with fig. 1 will not be described again. In fig. 4, addresses issued by the GPU (i.e., request addresses carried by data requests) may be scrambled by address interleaving and hashing to be distributed equally to different shared cache channels. In addition, for the identification information carried by the data request, a reference shared cache channel corresponding to the identification information can be selected from all the shared cache channels.
Fig. 5 is a schematic diagram of a cache lookup method according to an embodiment of the disclosure. As shown in fig. 5, a local cache lookup may be performed in response to a data request. If there is a hit in the local cache (i.e., the target data requested by the data request is found in the local cache), the target data is fetched from the local cache and returned to the data request. If there is a miss in the local cache (i.e., a cache miss occurs in the local cache), the first request address carried by the data request may be converted to a second request address by address interleaving and hashing. The shared cache channel corresponding to the target data can be determined according to the channel information in the second request address, and the data request can be sent to the corresponding shared cache channel. Corresponding group masks, group offsets and flag bit offsets can be obtained from the ID-cache group mapping table according to the identification information carried by the data request. The original tag information, the original set of bits, and the intra-row offset address may be obtained from the second request address. And performing bitwise AND operation on the original group bits and the group mask to obtain the relative positions of the target data requested by the data request in the plurality of shared cache groups corresponding to the identification information. The group offset may be added to the relative position to obtain a new group bit. The new tag information can be obtained by processing the original tag information, the flag bit offset and the designated bit in the original group bit. And carrying out shared cache searching on the shared cache channel corresponding to the target data according to the channel information, the in-line offset address, the new group bit and the new tag information. If hit in the shared cache, the target data is fetched from the shared cache and returned to the data request. If there is a miss (i.e., a miss) in the shared cache, the data request may be sent to the corresponding DRAM bank, the target data may be fetched from the DRAM bank and returned to the data request, and a determination may be made as to whether to fill the target data into the cache based on the cache request control signal.
The cache resource allocation method provided by the embodiment of the present disclosure is described below through a specific application scenario. In the application scenario, a first data request from any application can be responded to, and first identification information carried by the first data request is acquired. Wherein the first identification information includes any one of the following: the method comprises the steps of determining identification information according to a module called by the application, determining identification information according to context identification information, and determining identification information according to an address interval of target data requested by the first data request in a memory. The first data request may be a first received data request corresponding to the first identification information, and there are shared cache channels which are not determined as reference shared cache channels in the plurality of shared cache channels, and a first reference shared cache channel corresponding to the first identification information is determined from the shared cache channels which are not determined as reference shared cache channels, wherein the first reference shared cache channel represents a reference shared cache channel corresponding to the first identification information, the first preset number of shared cache groups in a first common shared cache channel are allocated to the first identification information, and the second preset number of shared cache groups in the first reference shared cache channel are allocated to the first identification information, wherein the first common shared cache channel represents a shared cache channel except for the first reference shared cache channel in the plurality of shared cache channels.
And obtaining the group mask, the group offset and the flag bit offset corresponding to the first identification information in response to determining that the cache miss occurs in the local cache according to the first request address. And remapping the first request address through address interleaving and hash operation to obtain a second request address. The original group bit and the original tag information can be obtained from the second request address; performing AND operation on the original group bit and the group mask to obtain the relative positions of the target data requested by the first data request in a plurality of shared cache groups corresponding to the first identification information; determining a new group bit corresponding to the first data request according to the group offset and the relative position; and determining new tag information corresponding to the first data request according to the original tag information, the flag bit offset and the designated bit in the original group bit. And performing cache searching according to the channel information and the in-row offset address in the second request address, the new group bit and the new tag information.
The shared cache resources allocated to the first identification information may be adjusted at a preset frequency. For example, a first hit rate of the first reference shared cache channel for the first identification information and a second hit rate of the first normal shared cache channel for the first identification information may be obtained; determining a ratio of the first hit rate to the second hit rate; responsive to the ratio being greater than or equal to a first preset threshold, increasing a number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; or, in response to the ratio being less than or equal to a second preset threshold, reducing the number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; the first preset threshold is greater than the second preset threshold, and the first preset threshold and the second preset threshold are both greater than 1.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the disclosure further provides a cache resource allocation device, an electronic device, a computer readable storage medium, and a computer program product, where the foregoing may be used to implement any cache resource allocation method provided in the disclosure, and the corresponding technical schemes and technical effects may be referred to the corresponding records of the method section and are not repeated.
Fig. 6 shows a block diagram of a cache resource allocation apparatus provided by an embodiment of the present disclosure. In an embodiment of the disclosure, a processor system includes at least two levels of caches, a highest level of the at least two levels of caches being a shared cache, the shared cache including a plurality of shared cache sets. As shown in fig. 6, the cache resource allocation apparatus includes:
a first obtaining module 61, configured to obtain, in response to a first data request from any application, first identification information carried by the first data request;
And the allocation module 62 is configured to allocate a preset number of shared cache groups to the first identification information in response to the first data request being a data request corresponding to the first identification information received for the first time.
In one possible implementation, the shared cache includes a plurality of shared cache channels;
the allocation module 62 is configured to:
and responding to the first data request as the data request corresponding to the first identification information received for the first time, and respectively distributing the preset number of shared cache groups in the plurality of shared cache channels to the first identification information.
In one possible implementation, the preset number includes a first preset number and a second preset number, and the first preset number is smaller than the second preset number;
the allocation module 62 is configured to:
responding to the first data request as the first received data request corresponding to the first identification information, and determining a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels, wherein the first reference shared cache channel represents the reference shared cache channel corresponding to the first identification information;
And allocating the first preset number of shared cache groups in a first common shared cache channel to the first identification information, and allocating the second preset number of shared cache groups in the first reference shared cache channel to the first identification information, wherein the first common shared cache channel represents a shared cache channel except the first reference shared cache channel in the plurality of shared cache channels.
In one possible implementation, the allocation module 62 is configured to:
and responding to the first data request as the first received data request corresponding to the first identification information, wherein the shared cache channels which are not determined as the reference shared cache channels exist in the plurality of shared cache channels, and determining the first reference shared cache channel corresponding to the first identification information from the shared cache channels which are not determined as the reference shared cache channels.
In one possible implementation, the apparatus further includes:
the second acquisition module is used for acquiring a first hit rate of the first reference shared cache channel aiming at the first identification information and a second hit rate of the first common shared cache channel aiming at the first identification information;
And the adjusting module is used for adjusting the number of the shared cache groups allocated to the first identification information according to the first hit rate and the second hit rate.
In one possible implementation, the adjusting module is configured to:
determining a ratio of the first hit rate to the second hit rate;
responsive to the ratio being greater than or equal to a first preset threshold, increasing a number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; or, in response to the ratio being less than or equal to a second preset threshold, reducing the number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; the first preset threshold is greater than the second preset threshold, and the first preset threshold and the second preset threshold are both greater than 1.
In one possible implementation, the apparatus further includes:
the third acquisition module is used for acquiring a first request address corresponding to the first data request;
a fourth obtaining module, configured to obtain a group mask, a group offset, and a flag bit offset corresponding to the first identification information in response to determining that a cache miss occurs in the local cache according to the first request address;
A determining module, configured to determine new group bits and new tag information corresponding to the first data request according to a second request address corresponding to the first data request, the group mask, the group offset, and the flag bit offset, where the second request address is determined according to the first request address;
and the searching module is used for searching the target data according to the channel information and the in-row offset address in the second request address, the new group bit and the new tag information.
In one possible implementation, the apparatus further includes:
and the remapping module is used for remapping the first request address to obtain the second request address.
In one possible implementation, the determining module is configured to:
acquiring original group bits and original tag information from the second request address;
performing AND operation on the original group bit and the group mask to obtain the relative positions of the target data requested by the first data request in a plurality of shared cache groups corresponding to the first identification information;
determining a new group bit corresponding to the first data request according to the group offset and the relative position;
And determining new tag information corresponding to the first data request according to the original tag information, the flag bit offset and the designated bit in the original group bit.
In one possible implementation, the search module is configured to:
and responding to the fact that target data cannot be found according to the channel information, the new group bits, the new tag information and the in-row offset address, acquiring the target data from a memory or an external memory, writing the target data into a shared cache group corresponding to the new group bits, and requesting to return the target data to the first data.
In one possible implementation manner, the first identification information includes any one of the following:
the method comprises the steps of determining identification information according to a module called by the application, determining identification information according to context identification information, and determining identification information according to an address interval of target data requested by the first data request in a memory.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementation and technical effects of the functions or modules may refer to the descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. Wherein the computer readable storage medium may be a non-volatile computer readable storage medium or may be a volatile computer readable storage medium.
The disclosed embodiments also propose a computer program comprising computer readable code which, when run in an electronic device, causes a processor in the electronic device to carry out the above method.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in an electronic device, causes a processor in the electronic device to perform the above method.
The embodiment of the disclosure also provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the executable instructions stored by the memory to perform the above-described method.
The electronic device may be provided as a terminal, server or other form of device.
Fig. 7 illustrates a block diagram of an electronic device 1900 provided by an embodiment of the disclosure. For example, electronic device 1900 may be provided as a terminal or server. Referring to FIG. 7, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output interface 1958 (I/O interface). Electronic device 1900 may operate an operating system based on memory 1932, such as the Microsoft Server operating system (Windows Server) TM ) Apple Inc. developed graphical user interface based operating System (Mac OS X TM ) Multi-user multi-process computer operating system (Unix) TM ) Unix-like operating system (Linux) of free and open source code TM ) Unix-like operating system (FreeBSD) with open source code TM ) Or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
If the technical scheme of the embodiment of the disclosure relates to personal information, the product applying the technical scheme of the embodiment of the disclosure clearly informs the personal information processing rule and obtains personal independent consent before processing the personal information. If the technical solution of the embodiment of the present disclosure relates to sensitive personal information, the product applying the technical solution of the embodiment of the present disclosure obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of "explicit consent". For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a type of personal information to be processed.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (13)

1. A method for allocating cache resources, wherein a processor system includes at least two levels of cache, a highest level of the at least two levels of cache is a shared cache, the shared cache includes a plurality of shared cache ways, each shared cache way includes a plurality of shared cache groups, each shared cache group includes a plurality of ways, the method comprising:
responding to a first data request from any application, and acquiring first identification information carried by the first data request, wherein the first data request is any data request sent by any application, and the first identification information represents the identification information carried by the first data request;
And responding to the first data request as the first received data request corresponding to the first identification information, and respectively distributing a preset number of shared cache groups in the plurality of shared cache channels to the first identification information, wherein the number of the shared cache groups distributed to the first identification information by at least two shared cache channels is different.
2. The method of claim 1, wherein the preset number comprises a first preset number and a second preset number, and the first preset number is less than the second preset number;
the responding to the first data request is a data request corresponding to the first identification information received for the first time, and respectively distributing a preset number of shared cache groups in the plurality of shared cache channels to the first identification information, including:
responding to the first data request as the first received data request corresponding to the first identification information, and determining a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels, wherein the first reference shared cache channel represents the reference shared cache channel corresponding to the first identification information;
And allocating the first preset number of shared cache groups in a first common shared cache channel to the first identification information, and allocating the second preset number of shared cache groups in the first reference shared cache channel to the first identification information, wherein the first common shared cache channel represents a shared cache channel except the first reference shared cache channel in the plurality of shared cache channels.
3. The method of claim 2, wherein the determining, in response to the first data request being a first received data request corresponding to the first identification information, a first reference shared cache channel corresponding to the first identification information from the plurality of shared cache channels, comprises:
and responding to the first data request as the first received data request corresponding to the first identification information, wherein the shared cache channels which are not determined as the reference shared cache channels exist in the plurality of shared cache channels, and determining the first reference shared cache channel corresponding to the first identification information from the shared cache channels which are not determined as the reference shared cache channels.
4. The method according to claim 2, wherein the method further comprises:
acquiring a first hit rate of the first reference shared cache channel for the first identification information and a second hit rate of the first common shared cache channel for the first identification information;
and adjusting the number of the shared cache groups allocated to the first identification information according to the first hit rate and the second hit rate.
5. The method of claim 4, wherein adjusting the number of shared cache sets allocated to the first identification information based on the first hit rate and the second hit rate comprises:
determining a ratio of the first hit rate to the second hit rate;
responsive to the ratio being greater than or equal to a first preset threshold, increasing a number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; or, in response to the ratio being less than or equal to a second preset threshold, reducing the number of shared cache groups allocated to the first identification information in the plurality of shared cache channels; the first preset threshold is greater than the second preset threshold, and the first preset threshold and the second preset threshold are both greater than 1.
6. The method according to claim 1, wherein the method further comprises:
acquiring a first request address corresponding to the first data request;
responding to the first request address to determine that cache miss occurs in a local cache, and acquiring a group mask, a group offset and a flag bit offset corresponding to the first identification information;
determining new group bits and new tag information corresponding to the first data request according to a second request address corresponding to the first data request, the group mask, the group offset and the flag bit offset, wherein the second request address is determined according to the first request address;
and searching target data according to the channel information and the in-row offset address in the second request address, the new group bit and the new tag information.
7. The method of claim 6, wherein the method further comprises:
and remapping the first request address to obtain the second request address.
8. The method of claim 6, wherein the determining new set of bits and new tag information corresponding to the first data request based on the second request address, the set mask, the set offset, and the flag bit offset corresponding to the first data request comprises:
Acquiring original group bits and original tag information from the second request address;
performing AND operation on the original group bit and the group mask to obtain the relative positions of the target data requested by the first data request in a plurality of shared cache groups corresponding to the first identification information;
determining a new group bit corresponding to the first data request according to the group offset and the relative position;
and determining new tag information corresponding to the first data request according to the original tag information, the flag bit offset and the designated bit in the original group bit.
9. The method of claim 6, wherein the looking up the target data based on the channel information and the intra-row offset address in the second request address, and the new set of bits and the new tag information, comprises:
and responding to the fact that target data cannot be found according to the channel information, the new group bits, the new tag information and the in-row offset address, acquiring the target data from a memory or an external memory, writing the target data into a shared cache group corresponding to the new group bits, and requesting to return the target data to the first data.
10. The method according to any one of claims 1 to 9, wherein the first identification information comprises any one of:
the method comprises the steps of determining identification information according to a module called by the application, determining identification information according to context identification information, and determining identification information according to an address interval of target data requested by the first data request in a memory.
11. A cache resource allocation apparatus, wherein a processor system comprises at least two levels of cache, a highest level of the at least two levels of cache being a shared cache, the shared cache comprising a plurality of shared cache ways, each shared cache way comprising a plurality of shared cache groups, each shared cache group comprising a plurality of ways, the apparatus comprising:
the first acquisition module is used for responding to a first data request from any application and acquiring first identification information carried by the first data request, wherein the first data request is any data request sent by any application, and the first identification information represents the identification information carried by the first data request;
the allocation module is used for responding to the first data request to allocate a preset number of shared cache groups to the first identification information, wherein the number of the shared cache groups allocated to the first identification information by at least two shared cache channels is different.
12. An electronic device, comprising:
one or more processors;
a memory for storing executable instructions;
wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the method of any of claims 1 to 10.
13. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 10.
CN202310153348.3A 2023-02-23 2023-02-23 Cache resource allocation method and device, electronic equipment and storage medium Active CN116010109B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202311076545.6A CN117093371A (en) 2023-02-23 2023-02-23 Cache resource allocation method and device, electronic equipment and storage medium
CN202310153348.3A CN116010109B (en) 2023-02-23 2023-02-23 Cache resource allocation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310153348.3A CN116010109B (en) 2023-02-23 2023-02-23 Cache resource allocation method and device, electronic equipment and storage medium

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202311076545.6A Division CN117093371A (en) 2023-02-23 2023-02-23 Cache resource allocation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116010109A CN116010109A (en) 2023-04-25
CN116010109B true CN116010109B (en) 2023-07-04

Family

ID=86037526

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310153348.3A Active CN116010109B (en) 2023-02-23 2023-02-23 Cache resource allocation method and device, electronic equipment and storage medium
CN202311076545.6A Pending CN117093371A (en) 2023-02-23 2023-02-23 Cache resource allocation method and device, electronic equipment and storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202311076545.6A Pending CN117093371A (en) 2023-02-23 2023-02-23 Cache resource allocation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (2) CN116010109B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116010109B (en) * 2023-02-23 2023-07-04 摩尔线程智能科技(北京)有限责任公司 Cache resource allocation method and device, electronic equipment and storage medium
CN116521095B (en) * 2023-07-03 2023-09-08 摩尔线程智能科技(北京)有限责任公司 Response output system, method, electronic device, storage medium, and program product

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8401994B2 (en) * 2009-09-18 2013-03-19 Oracle International Corporation Distributed consistent grid of in-memory database caches
US8935483B2 (en) * 2009-04-27 2015-01-13 Lsi Corporation Concurrent, coherent cache access for multiple threads in a multi-core, multi-thread network processor
US9021179B2 (en) * 2011-06-10 2015-04-28 International Business Machines Corporation Store storage class memory information command
CN102270180B (en) * 2011-08-09 2014-04-02 清华大学 Multicore processor cache and management method thereof
US10002076B2 (en) * 2015-09-29 2018-06-19 Nxp Usa, Inc. Shared cache protocol for parallel search and replacement
US10789175B2 (en) * 2017-06-01 2020-09-29 Mellanox Technologies Ltd. Caching policy in a multicore system on a chip (SOC)
CN109857681B (en) * 2017-11-30 2023-07-18 华为技术有限公司 Cache address mapping method and related equipment
US11086777B2 (en) * 2019-04-01 2021-08-10 Arm Limited Replacement of cache entries in a set-associative cache
CN112148665B (en) * 2019-06-28 2024-01-09 深圳市中兴微电子技术有限公司 Cache allocation method and device
US11481332B1 (en) * 2021-05-07 2022-10-25 Ventana Micro Systems Inc. Write combining using physical address proxies stored in a write combine buffer
US11593109B2 (en) * 2021-06-07 2023-02-28 International Business Machines Corporation Sharing instruction cache lines between multiple threads
CN114217861A (en) * 2021-12-06 2022-03-22 海光信息技术股份有限公司 Data processing method and device, electronic device and storage medium
CN114928652B (en) * 2022-04-29 2023-06-20 高德软件有限公司 Map data transmission method, map data transmission device, electronic device, storage medium, and program
CN115052042B (en) * 2022-06-07 2023-05-26 成都北中网芯科技有限公司 Method for realizing high-performance multi-channel shared cache
CN115098169B (en) * 2022-06-24 2024-03-05 海光信息技术股份有限公司 Method and device for fetching instruction based on capacity sharing
CN115061972B (en) * 2022-07-05 2023-10-13 摩尔线程智能科技(北京)有限责任公司 Processor, data read-write method, device and storage medium
CN115357196A (en) * 2022-08-31 2022-11-18 鹏城实验室 Dynamically expandable set-associative cache method, apparatus, device and medium
CN115168247B (en) * 2022-09-02 2022-12-02 北京登临科技有限公司 Method for dynamically sharing memory space in parallel processor and corresponding processor
CN116010109B (en) * 2023-02-23 2023-07-04 摩尔线程智能科技(北京)有限责任公司 Cache resource allocation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116010109A (en) 2023-04-25
CN117093371A (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN116010109B (en) Cache resource allocation method and device, electronic equipment and storage medium
US10152501B2 (en) Rollover strategies in a n-bit dictionary compressed column store
TWI559217B (en) Dynamic cache and memory allocation for memory subsystems
WO2015142341A1 (en) Dynamic memory expansion by data compression
CN107003940B (en) System and method for providing improved latency in non-uniform memory architectures
US10769073B2 (en) Bandwidth-based selective memory channel connectivity on a system on chip
CN116010300B (en) GPU (graphics processing Unit) caching method and device, electronic equipment and storage medium
US20220229701A1 (en) Dynamic allocation of computing resources
US8707006B2 (en) Cache index coloring for virtual-address dynamic allocators
US11567661B2 (en) Virtual memory management method and processor
US8935508B1 (en) Implementing pseudo content access memory
CN107111560B (en) System and method for providing improved latency in non-uniform memory architectures
US20170364442A1 (en) Method for accessing data visitor directory in multi-core system and device
US10997077B2 (en) Increasing the lookahead amount for prefetching
CN112839071B (en) Training system, training data access method and device, electronic equipment and medium
CN116107926B (en) Cache replacement policy management method, device, equipment, medium and program product
US10942904B2 (en) Mapping first identifier to second identifier
CN113805845A (en) Random number sequence generation method and random number engine
CN116166575B (en) Method, device, equipment, medium and program product for configuring access segment length
CN117742957A (en) Memory allocation method, memory allocation device, electronic equipment and storage medium
CN117539636A (en) Memory management method and device for bus module, electronic equipment and storage medium
CN117130662A (en) Instruction reading method, L2 instruction cache, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant