CN112783803B - Computer CPU-GPU shared cache control method and system - Google Patents
Computer CPU-GPU shared cache control method and system Download PDFInfo
- Publication number
- CN112783803B CN112783803B CN202110111509.3A CN202110111509A CN112783803B CN 112783803 B CN112783803 B CN 112783803B CN 202110111509 A CN202110111509 A CN 202110111509A CN 112783803 B CN112783803 B CN 112783803B
- Authority
- CN
- China
- Prior art keywords
- cpu
- gpu
- level cache
- cores
- core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention provides a method and a system for controlling a shared cache of a CPU-GPU (central processing unit-graphics processing unit) of a computer, which comprises the following steps of firstly, obtaining the utilization rate of each core of the CPU and the GPU and the first-level cache miss rate; calculating the product of the utilization rate of each core of the CPU and the GPU and the first-level cache miss rate; acquiring a CPU and GPU memory allocation proportion set by a user; and adjusting the last level cache shared by the CPU and the GPU according to the sequencing result. The method provided by the invention realizes dynamic adjustment of the LLC by combining the utilization rate of each core and the first-level cache miss rate and the setting of a user, overcomes the problems of too complex adjustment of the LLC and system resource waste in the prior art, optimizes and adjusts the LLC in a computer in a targeted manner, and improves the overall performance of a CPU chip integrated with a GPU.
Description
Technical Field
The application relates to the field of computer chips, in particular to a control method and a control system for sharing a last-level cache by a CPU-GPU.
Background
The cache is a high-speed memory between the CPU and the memory, when the CPU reads data, the data is firstly searched from the cache, if the data is found, the data is immediately read, otherwise, the data is read from the memory with relatively low speed, and the speed of the CPU can be increased by reasonably setting the cache. Before the GPU is used, the CPU is responsible for all transactions of the computer, and then the GPU which is specially responsible for graphic processing and floating point operation, namely a graphic processor, appears. With the development of large-scale integrated circuits, more and more electronic elements are integrated on a chip, and the integration of a CPU and a GPU on one chip brings new improvement to the performance of a computer. The chip integrating the CPU and the GPU realizes the interaction and sharing of data through a Last Level Cache (Last Level Cache LLC), but the LLC is excessively occupied due to more concurrent threads of the GPU core, and the Cache data of the CPU in the LLC can be replaced by the GPU if an LRU algorithm is adopted, so that the performance of the CPU is reduced.
In order to allocate LLC reasonably, many researchers and companies provide various ways to solve the above problems, such as a static allocation method that divides LLC into fixed parts, each part being assigned to a specific CPU or GPU core or thread, and a feedback type dynamic allocation method that dynamically adjusts LLC allocated to a program or core while a computer is running. However, in the static allocation method, the allocated LLC is fixed and cannot be adjusted in time, and although the dynamic allocation method implements dynamic adjustment, the existing dynamic adjustment process is complex, and the adjustment process wastes valuable system resources. How to economically and effectively adjust the LLC shared by the CPU and the GPU is an urgent problem to be solved in the field.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for controlling a computer CPU-GPU shared cache, which consider the utilization rate and the cache miss rate of a CPU core and a GPU core, and allocate the memory of the CPU and the GPU according to the needs of a user, and experiments prove that the invention has higher pertinence compared with the existing LLC adjusting technology.
In one aspect, the present invention provides a computer CPU-GPU shared cache control method, applied in a CPU-GPU fusion architecture, the method comprising the steps of:
acquiring the utilization rate and the first-level cache miss rate of each core of a CPU (Central processing Unit), and the utilization rate and the first-level cache miss rate of each core of a GPU (graphics processing Unit);
computing CPU core utilizationThe product of the rate and the first level cache miss rate to obtain C n N =1, …, N; calculating the product of the utilization rate of each core of the GPU and the first-level cache miss rate to obtain G m M =1, …, M; wherein, N is the number of CPU cores, and M is the number of GPU cores;
obtaining the memory allocation proportion of the CPU and the GPU set by the user, and according to the proportion and C n 、G m To obtain C' n 、G' m And to C' n 、G' m Sorting is carried out;
and adjusting the last level cache shared by the CPU and the GPU according to the sequencing result.
On the other hand, the memory allocation proportion of the CPU and the GPU set by the user is obtained according to the proportion and C n 、G m To obtain C' n 、G' m The method specifically comprises the following steps:
obtaining the size of the CPU and the GPU allocated by a user, and obtaining the proportion x of the memory allocated to the CPU and the GPU according to the size: y;
C' n =x*C n ,G' m =y*G m 。
on the other hand, the adjusting the last-level cache shared by the CPU and the GPU according to the sorting result specifically includes:
c 'are arranged in descending order' n And G' m Get a queue R s If R is s If the first threshold value is smaller than the second threshold value, the last level cache is not adjusted, otherwise, the last level cache is adjusted; where s = n + m, the second threshold is greater than the first threshold.
In another aspect, the adjusting the last-level cache specifically includes:
acquiring the number a1 of cores larger than a second threshold value and the number a2 of cores smaller than a first threshold value, if a1>0 and a2>0, distributing the last level cache corresponding to a2 cores at the tail of the queue to a1 cores at the head of the queue, if only a1>0, distributing the last level cache corresponding to a1 cores at the tail of the queue to a1 cores at the head of the queue, and if only a2>0, distributing the optimal level cache corresponding to a2 cores at the tail of the queue to a2 cores at the head of the queue.
In another aspect, the CPU-GPU converged architecture refers to a CPU chip that integrates a GPU.
In another aspect, the obtaining of the CPU and GPU memory allocation ratio set by the user specifically includes: and obtaining the memory allocated to the CPU and the memory allocated to the GPU from the BIOS, and calculating according to the memory allocated to the CPU and the memory allocated to the GPU to obtain the memory allocation ratio of the CPU and the GPU.
On the other hand, the invention also provides a computer CPU-GPU shared cache control system which is applied to the CPU-GPU fusion architecture and comprises the following modules:
the first acquisition module is used for acquiring the utilization rate and the first-level cache miss rate of each core of the CPU, and the utilization rate and the first-level cache miss rate of each core of the GPU;
a calculation module for calculating the product of the utilization rate of each core of the CPU and the first-level cache miss rate to obtain C n N =1, …, N; calculating the product of the utilization rate of each core of the GPU and the first-level cache miss rate to obtain G m M =1, …, M; wherein, N is the number of CPU cores, and M is the number of GPU cores;
a second obtaining module for obtaining the CPU and GPU memory allocation proportion set by the user, and according to the proportion and C n 、G m To obtain C' n 、G' m And to C' n 、G' m Sorting is carried out;
and the adjusting module is used for adjusting the last-level cache shared by the CPU and the GPU according to the sequencing result.
On the other hand, the memory allocation proportion of the CPU and the GPU set by the user is obtained according to the proportion and C n 、G m To obtain C' n 、G' m The method specifically comprises the following steps:
obtaining the size of the CPU and the GPU allocated by a user, and obtaining the proportion x of the memory allocated to the CPU and the GPU according to the size: y;
C' n =x*C n ,G' m =y*G m 。
furthermore, the present invention also provides a computer readable storage medium for storing computer program instructions, characterized in that the computer program instructions, when executed by a processor, implement the method as described above.
Furthermore, the present invention also provides an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method as described above.
The CPU and the GPU comprise a plurality of cores, if the utilization rate of the cores is high, the current cores are busy, and if the allocated last-level cache is too small, the cores also need to read data from the memory, so that the time overhead of reading the data of the cores is more increased, and the utilization rate of the cores is higher; because the program data has locality, if the first-level cache miss rate corresponding to the core is lower, the program in the core has stronger locality, and at the moment, the LLC does not need to be excessively distributed to the core; in addition, the size of the video memory allocated to the CPU and the GPU by the user reflects the characteristics of the program run by the user, and the size of the memory allocated to the CPU and the GPU is also considered. The LLC adjusting method provided by the invention effectively improves the overall performance of the chip and avoids the conflict of the CPU and the GPU on LLC resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of a CPU-GPU sharing LLC in the prior art;
FIG. 2 is a schematic flow chart of the present invention.
Detailed Description
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
In one embodiment, the invention provides a computer CPU-GPU shared cache control method, which is applied to a CPU-GPU fusion architecture and comprises the following steps:
acquiring the utilization rate and the first-level cache miss rate of each core of a CPU (Central processing Unit), and the utilization rate and the first-level cache miss rate of each core of a GPU (graphics processing Unit);
calculating the product of the utilization rate of each core of the CPU and the first-level cache miss rate to obtain C n N =1, …, N; calculating the product of the utilization rate of each core of the GPU and the first-level cache miss rate to obtain G m M =1, …, M; wherein, N is the number of CPU cores, and M is the number of GPU cores;
obtaining the memory allocation proportion of the CPU and the GPU set by the user, and according to the proportion and C n 、G m To obtain C' n 、G' m And to C' n 、G' m Sorting is carried out;
and adjusting the last level cache shared by the CPU and the GPU according to the sequencing result.
In another embodiment, the obtaining of the CPU and GPU memory allocation ratio set by the user is based on the ratio and C n 、G m To obtain C' n 、G' m The method specifically comprises the following steps:
obtaining the size which is set by a user and is allocated to a CPU and a GPU, and obtaining the proportion x of memory allocated to the CPU and the GPU according to the size: y;
C' n =x*C n ,G' m =y*G m 。
in another embodiment, the adjusting the last-level cache shared by the CPU and the GPU according to the sorting result specifically includes:
c 'are arranged in descending order' n And G' m Get the queue R s If R is s If the first threshold value is smaller than the second threshold value, the last level cache is not adjusted, otherwise, the last level cache is adjusted; where s = n + m, the second threshold is greater than the first threshold.
The utilization rate of each core of the CPU and the GPU is related to the complexity and concurrency of data processing, and is also related to the cache of the CPU and the GPU, and the larger the cache is, the lower the possibility that the CPU and the GPU read data from the memory is, and the shorter the time for reading the data is;
and because different programs have different characteristics, some programs have better data locality, and can read more data from the cache, the miss rate of the first-level cache of the programs is lower, while some programs, especially programs with a large amount of data, need to read data from a hard disk and a memory frequently, and the miss rate of the first-level cache of the programs is higher.
In addition, different users have different uses of computers, for example, some users are more used for playing network games or processing pictures and videos, the GPU memory of the user is generally allocated more, and some users only process word documents and browse webpages, and in order to improve the performance of the CPU, the memory allocated to the CPU is more. The purposes are different, and the allocated CPU and GPU memories are different. If the user allocates more memory for the GPU, more shared caches need to be allocated to meet the user's demand, and less shared caches can be allocated by the irregular method.
In order to facilitate understanding of the present invention, a specific example will be described below.
Assuming that the total computer memory of the user is 8G, the user allocates 5G for the CPU and allocates 3G for the GPU, and the ratio of the memory allocated by the CPU and the GPU is 5:3;
the current computer CPU has 4 cores and GPU has 8 stream processing cores, and at a certain moment, the utilization rate of the cores and the first-level cache miss rate are shown in table 1 below, where the weight of the CPU core is 5 and the weight of the GPU core is 3:
TABLE 1
Core(s) | Core utilization | First level cache miss rate | Product of | Weight product |
C1 | 80% | 80% | 0.64 | 3.2 |
C2 | 20% | 20% | 0.4 | 2.0 |
C3 | 60% | 30% | 0.18 | 0.9 |
C4 | 30% | 60% | 0.18 | 0.9 |
G1 | 80% | 80% | 0.64 | 1.92 |
G2 | 80% | 20% | 0.16 | 0.48 |
G3 | 90% | 90% | 0.81 | 2.43 |
G4 | 40% | 60% | 0.24 | 0.72 |
G5 | 10% | 10% | 0.01 | 0.03 |
G6 | 50% | 60% | 0.3 | 0.9 |
G7 | 40% | 60% | 0.24 | 0.72 |
G8 | 30% | 60% | 0.18 | 0.54 |
According to the final result, the cores are ordered as: c1, G3, C2, G1, C3, C4, G6, G4, G7, G8, G2, G5.
In another embodiment, the adjusting the last-level cache specifically includes:
acquiring the number a1 of cores larger than a second threshold value and the number a2 of cores smaller than a first threshold value, if a1>0 and a2>0, distributing the last level cache corresponding to a2 cores at the tail of the queue to a1 cores at the head of the queue, if only a1>0, distributing the last level cache corresponding to a1 cores at the tail of the queue to a1 cores at the head of the queue, and if only a2>0, distributing the optimal level cache corresponding to a2 cores at the tail of the queue to a2 cores at the head of the queue.
In another embodiment, the CPU-GPU fusion architecture refers to a CPU chip integrated with a GPU.
In another embodiment, the obtaining of the CPU and GPU memory allocation ratio set by the user specifically includes: and obtaining the memory allocated to the CPU and the memory allocated to the GPU from the BIOS, and calculating according to the memory allocated to the CPU and the memory allocated to the GPU to obtain the memory allocation ratio of the CPU and the GPU.
The first threshold and the second threshold are obtained by analyzing statistical information, such as the LLC analysis of the busy state time and the occupation of the core; or may be user or system specific.
Still taking the data in table 1 as an example, assuming that the second threshold is 2 and the first threshold is 0.9, the sorted core is divided into three parts, as shown in table 2 below.
TABLE 2
C1、G3 | C2、G1、C3、C4、G6、G4 | G7、G8、G2、G5 |
In this case, the LLC parts corresponding to G7, G8, G2 and G5 cores need to be allocated to C1 and G3, and there are various specific adjustment methods, for example, 20% of the LLC occupied or remaining by G7, G8, G2 and G5 cores are allocated to C1 and G3 on average, or may be allocated to C1 and G3 according to a certain ratio, which is not specifically limited in the present invention.
Example two
In another embodiment, the present invention further provides a computer CPU-GPU shared cache control system, applied in a CPU-GPU fusion architecture, wherein the system includes the following modules:
the first acquisition module is used for acquiring the utilization rate and the first-level cache miss rate of each core of the CPU, and the utilization rate and the first-level cache miss rate of each core of the GPU;
a calculation module for calculating the product of the utilization rate of each core of the CPU and the first-level cache miss rate to obtain C n N =1, …, N; calculating the product of the utilization rate of each core of the GPU and the first-level cache miss rate to obtain G m M =1, …, M; wherein, N is the number of CPU cores, and M is the number of GPU cores;
a second obtaining module for obtaining the CPU and GPU memory allocation proportion set by the user, according to the proportion and C n 、G m To obtain C' n 、G' m And to C' n 、G' m Sorting is carried out;
and the adjusting module is used for adjusting the last-level cache shared by the CPU and the GPU according to the sequencing result.
In another embodiment, the obtaining of the memory allocation proportion of the CPU and the GPU set by the user is based on the proportion and C n 、G m To obtain C' n 、G' m The method specifically comprises the following steps:
obtaining the size of the CPU and the GPU allocated by a user, and obtaining the proportion x of the memory allocated to the CPU and the GPU according to the size: y;
C' n =x*C n ,G' m =y*G m 。
other specific embodiments in the second embodiment are described in the specific embodiments in the first embodiment, and are not described herein again.
EXAMPLE III
In another embodiment, the present invention further provides a computer-readable storage medium for storing computer program instructions, wherein the computer program instructions, when executed by a processor, implement the method of embodiment one.
Example four
In another embodiment, the present invention further provides an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, and wherein the one or more computer program instructions are executed by the processor to implement the method of embodiment one.
The various embodiments described in the present invention may be combined to implement a corresponding technical solution. The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Claims (10)
1. A computer CPU-GPU shared cache control method is applied to a CPU-GPU fusion architecture, and is characterized by comprising the following steps:
acquiring the utilization rate and the first-level cache miss rate of each core of a CPU (Central processing Unit), and the utilization rate and the first-level cache miss rate of each core of a GPU (graphics processing Unit);
calculating the product of the utilization rate of each core of the CPU and the first-level cache miss rate to obtain C n N =1, …, N; calculating the product of the utilization rate of each core of the GPU and the first-level cache miss rate to obtain G m M =1, …, M; wherein, N is the number of CPU cores, and M is the number of GPU cores;
obtaining the memory allocation proportion of the CPU and the GPU set by the user, and according to the proportion and C n 、G m To obtain C' n 、G' m And to C' n 、G' m Sorting is carried out;
and adjusting the last level cache shared by the CPU and the GPU according to the sequencing result.
2. The method of claim 1, wherein the obtaining of the CPU and GPU memory allocation ratio set by the user is based on the ratio and C n 、G m To obtain C' n 、G' m The method specifically comprises the following steps:
obtaining the size of the CPU and the GPU allocated by a user, and obtaining the proportion x of the memory allocated to the CPU and the GPU according to the size: y;
C' n =x*C n ,G' m =y*G m 。
3. the method of claim 1, wherein the adjusting the last level cache shared by the CPU and the GPU according to the sorting result specifically comprises:
c 'are arranged in descending order' n And G' m Get a queue R s If R is s If the first threshold value is smaller than the second threshold value, the last level cache is not adjusted, otherwise, the last level cache is adjusted; where s = n + m, the second threshold is greater than the first threshold.
4. The method of claim 3, wherein the adjusting the last level cache comprises:
acquiring the number a1 of cores larger than a second threshold value and the number a2 of cores smaller than a first threshold value, if the a1 is larger than 0 and the a2 is larger than 0, distributing the last level cache corresponding to the a2 cores at the tail of the queue to the a1 cores at the head of the queue, if only the a1 is larger than 0, distributing the last level cache corresponding to the a1 cores at the tail of the queue to the a1 cores at the head of the queue, and if only the a2 is larger than 0, distributing the optimal level cache corresponding to the a2 cores at the tail of the queue to the a2 cores at the head of the queue.
5. The method of claim 1, wherein the CPU-GPU converged architecture refers to a CPU chip that integrates a GPU.
6. The method according to claim 1, wherein the obtaining of the CPU and GPU memory allocation ratio set by the user specifically comprises: and acquiring the memory allocated to the CPU and the memory allocated to the GPU from the BIOS, and calculating the memory allocation proportion of the CPU and the GPU according to the memory allocated to the CPU and the memory allocated to the GPU.
7. A computer CPU-GPU shared cache control system is applied to a CPU-GPU fusion architecture, and is characterized by comprising the following modules:
the first acquisition module is used for acquiring the utilization rate and the first-level cache miss rate of each core of the CPU, the utilization rate and the first-level cache miss rate of each core of the GPU;
a calculation module for calculating the product of the utilization rate of each core of the CPU and the first-level cache miss rate to obtain C n N =1, …, N; calculating the product of the utilization rate of each core of the GPU and the first-level cache miss rate to obtain G m M =1, …, M; wherein, N is the number of CPU cores, and M is the number of GPU cores;
a second obtaining module for obtaining the CPU and GPU memory allocation proportion set by the user, and according to the proportion and C n 、G m To obtain C' n 、G' m And to C' n 、G' m Sorting is carried out;
and the adjusting module is used for adjusting the last-level cache shared by the CPU and the GPU according to the sequencing result.
8. The system of claim 7, wherein the obtaining of the user set CPU and GPU memory allocation ratio is based on the ratio and C n 、G m To obtain C' n 、G' m The method specifically comprises the following steps:
obtaining the size of the CPU and the GPU allocated by a user, and obtaining the proportion x of the memory allocated to the CPU and the GPU according to the size: y;
C' n =x*C n ,G' m =y*G m 。
9. a computer readable storage medium storing computer program instructions, which when executed by a processor implement the method of any one of claims 1-6.
10. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110111509.3A CN112783803B (en) | 2021-01-27 | 2021-01-27 | Computer CPU-GPU shared cache control method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110111509.3A CN112783803B (en) | 2021-01-27 | 2021-01-27 | Computer CPU-GPU shared cache control method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112783803A CN112783803A (en) | 2021-05-11 |
CN112783803B true CN112783803B (en) | 2022-11-18 |
Family
ID=75758092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110111509.3A Active CN112783803B (en) | 2021-01-27 | 2021-01-27 | Computer CPU-GPU shared cache control method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112783803B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113806247A (en) * | 2021-07-22 | 2021-12-17 | 上海擎昆信息科技有限公司 | Device and method for flexibly using data cache in 5G communication chip |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927277A (en) * | 2014-04-14 | 2014-07-16 | 中国人民解放军国防科学技术大学 | CPU (central processing unit) and GPU (graphic processing unit) on-chip cache sharing method and device |
WO2017000673A1 (en) * | 2015-06-29 | 2017-01-05 | 深圳市中兴微电子技术有限公司 | Shared cache allocation method and apparatus and computer storage medium |
CN106708626A (en) * | 2016-12-20 | 2017-05-24 | 北京工业大学 | Low power consumption-oriented heterogeneous multi-core shared cache partitioning method |
CN108399145A (en) * | 2018-02-08 | 2018-08-14 | 山东大学 | A kind of CPU-GPU heterogeneous platforms share last level cache management method, framework and device |
CN111190735A (en) * | 2019-12-30 | 2020-05-22 | 湖南大学 | Linux-based on-chip CPU/GPU (Central processing Unit/graphics processing Unit) pipelined computing method and computer system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9626295B2 (en) * | 2015-07-23 | 2017-04-18 | Qualcomm Incorporated | Systems and methods for scheduling tasks in a heterogeneous processor cluster architecture using cache demand monitoring |
-
2021
- 2021-01-27 CN CN202110111509.3A patent/CN112783803B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927277A (en) * | 2014-04-14 | 2014-07-16 | 中国人民解放军国防科学技术大学 | CPU (central processing unit) and GPU (graphic processing unit) on-chip cache sharing method and device |
WO2017000673A1 (en) * | 2015-06-29 | 2017-01-05 | 深圳市中兴微电子技术有限公司 | Shared cache allocation method and apparatus and computer storage medium |
CN106708626A (en) * | 2016-12-20 | 2017-05-24 | 北京工业大学 | Low power consumption-oriented heterogeneous multi-core shared cache partitioning method |
CN108399145A (en) * | 2018-02-08 | 2018-08-14 | 山东大学 | A kind of CPU-GPU heterogeneous platforms share last level cache management method, framework and device |
CN111190735A (en) * | 2019-12-30 | 2020-05-22 | 湖南大学 | Linux-based on-chip CPU/GPU (Central processing Unit/graphics processing Unit) pipelined computing method and computer system |
Non-Patent Citations (6)
Title |
---|
Co-Scheduling on Fused CPU-GPU Architectures With Shared Last Level Caches;M.Damschen et al.;《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》;20180718;第37卷(第11期);全文 * |
CPU-GPU融合架构上的缓存性能分析与优化;孙传伟等;《计算机工程与应用》;20150902;第53卷(第02期);全文 * |
Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture;H.Wen et al.;《2019 IEEE High Performance Extreme Computing Conference》;20191128;全文 * |
Research on Cache Partitioning and Adaptive Replacement Policy for CPU-GPU Heterogeneous Processors;J.Fang el al.;《2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science》;20180111;全文 * |
基于CPU-GPU异构架构下Cache优化技术的研究;刘士建;《中国优秀硕士学位论文全文数据库-信息科技辑》;20190515;第2019年卷(第5期);全文 * |
异构多核环境下共享缓存管理技术的研究;郝晓娜;《中国优秀硕士学位论文全文数据库-信息科技辑》;20180715;第2018年卷(第7期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112783803A (en) | 2021-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107038069B (en) | Dynamic label matching DLMS scheduling method under Hadoop platform | |
US8190795B2 (en) | Memory buffer allocation device and computer readable medium having stored thereon memory buffer allocation program | |
KR101761301B1 (en) | Memory resource optimization method and apparatus | |
CN103902474B (en) | Mixed storage system and method for supporting solid-state disk cache dynamic distribution | |
US6851030B2 (en) | System and method for dynamically allocating associative resources | |
CN110226157A (en) | Dynamic memory for reducing row buffering conflict remaps | |
US7185167B2 (en) | Heap allocation | |
US7979668B2 (en) | Method and system for automatically distributing real memory between virtual memory page sizes | |
US8060679B2 (en) | Information processing apparatus and access control method capable of high-speed data access | |
WO2023050712A1 (en) | Task scheduling method for deep learning service, and related apparatus | |
US8495302B2 (en) | Selecting a target number of pages for allocation to a partition | |
CN112181613B (en) | Heterogeneous resource distributed computing platform batch task scheduling method and storage medium | |
KR20130068685A (en) | Hybrid main memory system and task scheduling method therefor | |
US20070294448A1 (en) | Information Processing Apparatus and Access Control Method Capable of High-Speed Data Access | |
CN107111557A (en) | Shared cache memory distribution control is provided in shared high-speed buffer storage system | |
CN112783803B (en) | Computer CPU-GPU shared cache control method and system | |
US20180246820A1 (en) | Multiple linked list data structure | |
US20190056872A1 (en) | Reallocate memory pending queue based on stall | |
US7904688B1 (en) | Memory management unit for field programmable gate array boards | |
CN112540934B (en) | Method and system for ensuring service quality when multiple delay key programs are executed together | |
CN113806089A (en) | Cluster load resource scheduling method and device, electronic equipment and readable storage medium | |
JP2020021417A (en) | Database management system and method | |
CN104050189B (en) | The page shares processing method and processing device | |
CN111427887A (en) | Method, device and system for rapidly scanning HBase partition table | |
CN114924848A (en) | IO (input/output) scheduling method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20221103 Address after: Room 502-A, Building 4, Xingong Science Park, No. 100, Luyun Road, Lugu Street, Changsha Hi tech Development Zone, 410000, Hunan Province Applicant after: Hunan Zhongke Changxing Technology Co.,Ltd. Address before: 619, Building B, Dashiqiao SOHO Plaza, Jinshui District, Zhengzhou City, Henan Province, 450,000 Applicant before: Yu Hui |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |