CN113377688B - L1 cache sharing method for GPU - Google Patents

L1 cache sharing method for GPU Download PDF

Info

Publication number
CN113377688B
CN113377688B CN202110519990.XA CN202110519990A CN113377688B CN 113377688 B CN113377688 B CN 113377688B CN 202110519990 A CN202110519990 A CN 202110519990A CN 113377688 B CN113377688 B CN 113377688B
Authority
CN
China
Prior art keywords
cache
access request
request
multiprocessor
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110519990.XA
Other languages
Chinese (zh)
Other versions
CN113377688A (en
Inventor
赵夏
何益百
张拥军
张光达
陈任之
隋京高
王承智
王璐
王君展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Defense Technology Innovation Institute PLA Academy of Military Science
Original Assignee
National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Defense Technology Innovation Institute PLA Academy of Military Science filed Critical National Defense Technology Innovation Institute PLA Academy of Military Science
Priority to CN202110519990.XA priority Critical patent/CN113377688B/en
Publication of CN113377688A publication Critical patent/CN113377688A/en
Application granted granted Critical
Publication of CN113377688B publication Critical patent/CN113377688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a method for sharing an L1 cache of a GPU, which comprises the following steps: s11, judging whether the local memory access request is empty, if so, executing S21, and if not, executing S12; s12, taking out the request to access the L1 cache; s13, judging whether the data are hit, if yes, returning the data, and if not, executing S14; s14, judging whether the program is a storage-intensive program or not, if so, sending the request to other SMs, executing S15, and if not, sending the request to an L2 cache; s15, judging whether a cache data block needs to be replaced or not, if so, sending a data block replacement request to other SMs; s21, judging whether the remote memory access request is empty or not, and if not, executing S22; s22, taking out the request to access the L1 cache; s23, judging whether the data is hit or not, if yes, returning the data, executing S24, and if not, sending the request to the L2 cache, and executing S24; s24, judging whether the remote data request is empty or not, and if not, storing the data block needing to be replaced into the L1 cache. The present invention enables a run-memory-intensive program to use the L1 cache on the SM running the compute-intensive program.

Description

L1 cache sharing method for GPU
Technical Field
The invention relates to the technical field of a GPU (graphics processing Unit), in particular to a method for sharing an L1 cache of the GPU.
Background
A Graphics Processing Unit (GPU) is a microprocessor used for performing operations related to images and Graphics, and the GPU is widely used in cloud computing platforms and data centers due to its powerful computing capability, and provides users with required computation. Compared with a single-task GPU which only runs one task on the GPU, the multi-task GPU can simultaneously run a plurality of tasks on the GPU, and the utilization rate of resources can be effectively improved. Specifically, the multitasking GPU may run a compute-intensive program and a storage-intensive program on one GPU at the same time, and the compute resources and the storage resources on the GPU may be fully utilized at the same time.
At present, a space multitasking mode is mainly adopted to implement that a GPU simultaneously runs multiple tasks, specifically, in the space multitasking mode, all SMs (Streaming multiprocessors) on the GPU are averagely divided into two groups, and each group of SMs is used to run an application program. The space multitask GPU can simultaneously run a calculation intensive program and a storage intensive program through space sharing, and the utilization rate of calculation resources and storage resources is improved.
However, although the space multitask GPU which simultaneously runs the compute intensive program and the storage intensive program can effectively improve the overall resource utilization rate of the system, running different programs on different SMs causes resource utilization imbalance on SM, especially L1 Cache (first level Cache, L1 Cache) resources, and further improves the performance of the multitask GPU. Specifically, for an SM running a storage-intensive program, such a program may generate a large number of access requests, which results in that L1 Cache resources are excessively used, the L1 Cache failure rate is high, and the failed requests are sent to an L2 Cache (a second level Cache, an L2 Cache) and a storage system through an on-chip internet network, which may bring a large access overhead; for SMs running compute intensive programs, such programs have few memory access requests, resulting in inefficient utilization of L1 cache resources.
Disclosure of Invention
To solve some or all of the above technical problems in the prior art, the present invention provides an L1 cache sharing method for a GPU.
The invention discloses an L1 cache sharing method for a GPU, which comprises the following steps:
s11, judging whether a local access request of the current stream multiprocessor is empty, if so, executing a step S21, and if not, executing a step S12;
s12, taking out the local access request to access the L1 cache;
s13, judging whether the local access memory request hits an L1 cache, if so, returning corresponding data to an access memory component of the current stream multiprocessor, and if not, executing the step S14;
s14, judging whether the current task of the current stream multiprocessor is a storage-intensive program, if so, sending the local memory access request to other stream multiprocessors where the computation-intensive program corresponding to the current stream multiprocessor is located, and executing the step S15, and if not, sending the local memory access request to an L2 cache and/or a storage system;
s15, judging whether a cache data block in an L1 cache of the current stream multiprocessor needs to be replaced, if so, sending the data block needing to be replaced and a data block replacement request to other stream multiprocessors where compute-intensive programs corresponding to the current stream multiprocessor are located;
s21, judging whether a remote access request is empty or not, and if not, executing the step S22, wherein the remote access request represents an access request sent by other stream multiprocessors to the current stream multiprocessor;
s22, the remote memory access request is taken out to access the L1 cache;
s23, judging whether the remote access request hits an L1 cache, if so, returning corresponding data to a streaming multiprocessor for sending the remote access request, and executing the step S24, otherwise, sending the remote access request to an L2 cache and/or a storage system, and executing the step S24;
and S24, judging whether the remote data request is empty, if not, storing the data block which needs to be replaced and corresponds to the remote data request into an L1 cache of the current stream multiprocessor, wherein the remote data request represents a data block replacement request which is sent by other stream multiprocessors to the current stream multiprocessor.
In some optional embodiments, a local memory access request queue unit is created in the streaming multiprocessor and used for storing the memory access requests generated by the current streaming multiprocessor in a queue structure.
In some optional embodiments, the fetching the local access request accesses an L1 cache, including:
and taking out the memory access request positioned at the head of the local memory access request queue in the local memory access request queue unit to access the L1 cache of the current stream multiprocessor.
In some optional embodiments, a remote access request queue unit is created in the streaming multiprocessor and used for storing the access requests sent by other streaming multiprocessors in a queue structure.
In some optional embodiments, the fetching the remote access request accesses an L1 cache, including:
and fetching the access request positioned at the head of the remote access request queue in the remote access request queue unit to access the L1 cache of the current stream multiprocessor.
In some optional embodiments, a remote data request queue unit is created in the streaming multiprocessor, and is used for storing data block replacement requests sent by other streaming multiprocessors in a queue structure.
In some optional embodiments, the storing the data block corresponding to the remote data request, which needs to be replaced, in the L1 cache of the current streaming multiprocessor includes:
and storing the data block which needs to be replaced and corresponds to the data block replacement request positioned at the head of the data block replacement request queue in the remote data request queue unit into the L1 cache of the current streaming multiprocessor.
In some optional embodiments, a selection logic unit is created in the streaming multiprocessor, and the selection logic unit is used for judging and selecting the local memory access request, the remote memory access request and the remote data request.
In some optional embodiments, whether the current task of the current streaming multiprocessor is a storage-intensive program is determined by the frequency of accesses to the L1 cache by the current streaming multiprocessor during the current program run.
In some optional embodiments, if the number of accesses per thousand instructions to the L1 cache of the current streaming multiprocessor is greater than a preset threshold, it is determined that the current task of the current streaming multiprocessor is a storage-intensive program.
The technical scheme of the invention has the following main advantages:
the L1 cache sharing method for the GPU can realize that the stream multiprocessor SM running the storage intensive program uses the L1 cache on the stream multiprocessor SM running the calculation intensive program, fully utilizes the L1 cache resource in the GPU, improves the resource utilization rate of the system, and solves the problem of unbalanced utilization rate of the L1 cache on the stream multiprocessor SM when different tasks run in the space multi-task GPU.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for sharing an L1 cache of a GPU according to one embodiment of the present invention;
FIG. 2 is a flow multiprocessor micro-architecture according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The technical solution provided by an embodiment of the present invention is described in detail below with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention provides an L1 cache sharing method for a GPU, where the method is used for a space multitask GPU which runs a compute-intensive program and a storage-intensive program at the same time, the space multitask GPU divides stream multiprocessors SM on the GPU into two groups in an equal division manner, one group of stream multiprocessors SM runs the compute-intensive program, the other group of stream multiprocessors SM runs the storage-intensive program, the stream multiprocessors SM running the storage-intensive program all have one stream multiprocessors SM running the compute-intensive program corresponding thereto, and different stream multiprocessors SM are connected to each other through an on-chip interconnection network, so as to implement data communication between the stream multiprocessors; the L1 cache sharing method for the GPU comprises the following steps:
s11, judging whether a local memory access request of the current streaming multiprocessor is empty, if so, executing a step S21, otherwise, executing a step S12;
s12, taking out the local access request to access the L1 cache;
s13, judging whether the local memory access request hits the L1 cache, if so, returning corresponding data to a memory access component of the current streaming multiprocessor, and if not, executing a step S14;
s14, judging whether the current task of the current stream multiprocessor is a storage-intensive program, if so, sending a local memory access request to other stream multiprocessors where the computation-intensive program corresponding to the current stream multiprocessor is located, and executing the step S15, otherwise, sending the local memory access request to an L2 cache and/or a storage system;
s15, judging whether a cache data block in an L1 cache of the current stream multiprocessor needs to be replaced, if so, sending the data block needing to be replaced and a data block replacement request to other stream multiprocessors where compute-intensive programs corresponding to the current stream multiprocessor are located;
s21, judging whether a remote access request is empty, if not, executing the step S22, wherein the remote access request represents an access request sent by other stream multiprocessors to the current stream multiprocessor;
s22, taking out a remote access request to access the L1 cache;
s23, judging whether the remote access request hits the L1 cache, if so, returning corresponding data to the streaming multiprocessor for sending the remote access request, and executing the step S24, otherwise, sending the remote access request to the L2 cache and/or the storage system, and executing the step S24;
and S24, judging whether the remote data request is empty, if not, storing the data block which needs to be replaced and corresponds to the remote data request into an L1 cache of the current stream multiprocessor, wherein the remote data request represents the data block replacement request sent by other stream multiprocessors to the current stream multiprocessor.
If the local memory access request and the remote memory access request are both empty, it indicates that no memory access request needs to access the L1 cache in the current clock cycle.
In an embodiment of the present invention, the above-mentioned L1 cache sharing method is adopted for each stream multiprocessor SM of the GPU, so that the stream multiprocessor SM running the storage-intensive program uses the L1 cache on the stream multiprocessor SM running the computation-intensive program, L1 cache resources in the GPU are fully utilized, resource utilization rate of the system is improved, and the problem of unbalanced utilization rate of the L1 cache on the stream multiprocessor SM when different tasks are run in the spatial multitask GPU is solved.
Referring to fig. 2, in an embodiment of the present invention, a local memory access request queue unit is created in a streaming multiprocessor, and is used for storing memory access requests generated by a current streaming multiprocessor in a queue structure.
Further, when the local memory access request is stored in a queue structure, in the L1 cache sharing method, taking out the local memory access request to access the L1 cache specifically includes:
and taking out the access request positioned at the head of the local access request queue in the local access request queue unit to access the L1 cache of the current stream multiprocessor.
Referring to fig. 2, in an embodiment of the present invention, a remote access request queue unit is created in a streaming multiprocessor, and is used to store access requests sent by other streaming multiprocessors in a queue structure.
Further, when the remote access request is stored in a queue structure, in the L1 cache sharing method, the accessing of the L1 cache by the remote access request is taken out, which specifically includes:
and fetching the access request positioned at the head of the remote access request queue in the remote access request queue unit to access the L1 cache of the current streaming multiprocessor.
Referring to fig. 2, in an embodiment of the present invention, a remote data request queue unit is created in a streaming multiprocessor and used for storing data block replacement requests sent by other streaming multiprocessors in a queue structure.
Further, when the data block replacement request is stored in a queue structure, in the L1 cache sharing method, the storing the data block to be replaced corresponding to the remote data request into the L1 cache of the current streaming multiprocessor specifically includes:
and storing the data block which needs to be replaced and corresponds to the data block replacement request positioned at the head of the data block replacement request queue in the far-end data request queue unit into the L1 cache of the current streaming multiprocessor.
Referring to fig. 2, in an embodiment of the present invention, a selection logic unit is created in a streaming multiprocessor, and the selection logic unit is used for determining and selecting a local access request, a remote access request, and a remote data request.
Specifically, on the basis that a local memory access request queue unit, a remote memory access request queue unit and a remote data request queue unit are established in a stream multiprocessor, a selection logic unit is respectively connected with the local memory access request queue unit, the remote data request queue unit and an L1 cache, and the selection logic unit judges and selects requests in the local memory access request queue unit, the remote memory access request queue unit and the remote data request queue unit and sends the selected requests to the L1 cache.
The performance parameters of the application program can reflect the characteristics of the application program and the operation type of the application program, and compared with a calculation-intensive program, the storage-intensive program has higher access frequency to the L1 in the program operation process; for this reason, in an embodiment of the present invention, whether the current task of the current streaming multiprocessor is a storage-intensive program may be determined by the access frequency of the current streaming multiprocessor to the L1 cache during the current program running.
Specifically, if the access number APKI of every thousand instructions of the L1 cache of the current streaming multiprocessor is greater than a preset threshold, it is determined that the current task of the current streaming multiprocessor is a storage-intensive program.
The Access per Kilo-Instruction (APKI) is a parameter that reflects the frequency with which applications Access memory, and applications with high APKI values have more memory accesses.
The preset threshold may be, for example, 10.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. In addition, "front", "rear", "left", "right", "upper" and "lower" in this document are referred to the placement states shown in the drawings.
Finally, it should be noted that: the above examples are only for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for L1 cache sharing for a GPU, comprising the steps of:
s11, judging whether a local memory access request of the current streaming multiprocessor is empty, if so, executing a step S21, otherwise, executing a step S12;
s12, taking out the local memory access request to access the L1 cache;
s13, judging whether the local access memory request hits an L1 cache, if so, returning corresponding data to an access memory component of the current stream multiprocessor, and if not, executing the step S14;
s14, judging whether the current task of the current stream multiprocessor is a storage intensive program, if so, sending the local access request to other stream multiprocessors where the computation intensive program corresponding to the current stream multiprocessor is located, and executing the step S15, otherwise, sending the local access request to an L2 cache and/or a storage system;
s15, judging whether a cache data block in an L1 cache of the current stream multiprocessor needs to be replaced, if so, sending the data block needing to be replaced and a data block replacement request to other stream multiprocessors where compute-intensive programs corresponding to the current stream multiprocessor are located;
s21, judging whether a remote access request is empty or not, and if not, executing the step S22, wherein the remote access request represents an access request sent by other stream multiprocessors to the current stream multiprocessor;
s22, the remote access request is taken out to access the L1 cache;
s23, judging whether the remote access request hits an L1 cache, if so, returning corresponding data to a streaming multiprocessor for sending the remote access request, and executing the step S24, otherwise, sending the remote access request to an L2 cache and/or a storage system, and executing the step S24;
and S24, judging whether the remote data request is empty, if not, storing the data block which needs to be replaced and corresponds to the remote data request into an L1 cache of the current stream multiprocessor, wherein the remote data request represents a data block replacement request which is sent by other stream multiprocessors to the current stream multiprocessor.
2. The L1 cache sharing method for GPU as claimed in claim 1, characterized in that a local memory access request queue unit is created in the streaming multiprocessor for storing the memory access requests generated by the current streaming multiprocessor in a queue structure.
3. The L1 cache sharing method for a GPU of claim 2, wherein said fetching the local access request accesses an L1 cache comprising:
and fetching the access request positioned at the head of the local access request queue in the local access request queue unit to access the L1 cache of the current stream multiprocessor.
4. An L1 cache sharing method for GPU according to claim 1 or 2, characterized in that a remote access request queue unit is created in the stream multiprocessor for storing the access requests sent by other stream multiprocessor in a queue structure.
5. The L1 cache sharing method for a GPU of claim 4, wherein the fetching the remote access request to access an L1 cache comprises:
and fetching the access request positioned at the head of the remote access request queue in the remote access request queue unit to access the L1 cache of the current stream multiprocessor.
6. The L1 cache sharing method for the GPU of claim 1, 2 or 4, characterized in that a remote data request queue unit is created in the streaming multiprocessor for storing data block replacement requests sent by other streaming multiprocessors in a queue structure.
7. The L1 cache sharing method for the GPU of claim 6, wherein the step of storing the data block corresponding to the remote data request which needs to be replaced into the L1 cache of the current streaming multiprocessor comprises the following steps:
and storing the data block which needs to be replaced and corresponds to the data block replacement request positioned at the head of the data block replacement request queue in the remote data request queue unit into the L1 cache of the current streaming multiprocessor.
8. An L1 cache sharing method for GPU according to any of claims 1 to 7 characterized in that a selection logic unit is created in the streaming multiprocessor, said selection logic unit is used for judging and selecting local access request, remote access request and remote data request.
9. The L1 cache sharing method for the GPU of claim 1, wherein whether the current task of the current streaming multiprocessor is a storage intensive program is judged by the access frequency of the current streaming multiprocessor to the L1 cache in the current program running.
10. The method of claim 9, wherein if the number of accesses per thousand instructions to the L1 cache of the current streaming multiprocessor is greater than a predetermined threshold, determining that the current task of the current streaming multiprocessor is a storage-intensive program.
CN202110519990.XA 2021-05-13 2021-05-13 L1 cache sharing method for GPU Active CN113377688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110519990.XA CN113377688B (en) 2021-05-13 2021-05-13 L1 cache sharing method for GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110519990.XA CN113377688B (en) 2021-05-13 2021-05-13 L1 cache sharing method for GPU

Publications (2)

Publication Number Publication Date
CN113377688A CN113377688A (en) 2021-09-10
CN113377688B true CN113377688B (en) 2022-10-11

Family

ID=77572605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110519990.XA Active CN113377688B (en) 2021-05-13 2021-05-13 L1 cache sharing method for GPU

Country Status (1)

Country Link
CN (1) CN113377688B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927277A (en) * 2014-04-14 2014-07-16 中国人民解放军国防科学技术大学 CPU (central processing unit) and GPU (graphic processing unit) on-chip cache sharing method and device
CN104461957A (en) * 2014-08-28 2015-03-25 浪潮(北京)电子信息产业有限公司 Method and device for heterogeneous multi-core CPU share on-chip caching

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927277A (en) * 2014-04-14 2014-07-16 中国人民解放军国防科学技术大学 CPU (central processing unit) and GPU (graphic processing unit) on-chip cache sharing method and device
CN104461957A (en) * 2014-08-28 2015-03-25 浪潮(北京)电子信息产业有限公司 Method and device for heterogeneous multi-core CPU share on-chip caching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
快速地址计算的自适应栈高速缓存;郇丹丹等;《计算机研究与发展》;20070128(第01期);全文 *

Also Published As

Publication number Publication date
CN113377688A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
US10748237B2 (en) Adaptive scheduling for task assignment among heterogeneous processor cores
CN109375872B (en) Data access request processing method, device and equipment and storage medium
US11876731B2 (en) System and methods for sharing memory subsystem resources among datacenter applications
CN108182105B (en) Local dynamic migration method and control system based on Docker container technology
Ma et al. Real-time virtual machine scheduling in industry IoT network: A reinforcement learning method
CN104268018A (en) Job scheduling method in Hadoop cluster and job scheduler
Issawi et al. An efficient adaptive load balancing algorithm for cloud computing under bursty workloads
CN109960575A (en) A kind of computing capability sharing method, system and relevant device
Monil et al. QoS-aware virtual machine consolidation in cloud datacenter
JP2013186770A (en) Data processing device
CN110990154A (en) Big data application optimization method and device and storage medium
CN106681830B (en) A kind of task buffer space monitoring method and apparatus
Singh et al. Comparative analysis of VM consolidation algorithms for cloud computing
CN113377688B (en) L1 cache sharing method for GPU
CN116089477B (en) Distributed training method and system
Zhang et al. PRMRAP: A proactive virtual resource management framework in cloud
CN103955397A (en) Virtual machine scheduling multi-strategy selection method based on micro-architecture perception
CN114201306B (en) Multi-dimensional geographic space entity distribution method and system based on load balancing technology
CN106775942B (en) Cloud application-oriented solid-state disk cache management system and method
CN113377866A (en) Load balancing method and device for virtualized database proxy service
EP3096227A1 (en) Resource allocation method in distributed clouds
CN112114967A (en) GPU resource reservation method based on service priority
Dai et al. A resource occupancy ratio-oriented load balancing task scheduling mechanism for flink
WO2020019315A1 (en) Computational operation scheduling method employing graphic data, system, computer readable medium, and apparatus
Çağlar et al. An energy efficient VM allocation approach for data centers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant