KR101442643B1 - The Cooperation System and the Method between CPU and GPU - Google Patents
The Cooperation System and the Method between CPU and GPU Download PDFInfo
- Publication number
- KR101442643B1 KR101442643B1 KR1020130048061A KR20130048061A KR101442643B1 KR 101442643 B1 KR101442643 B1 KR 101442643B1 KR 1020130048061 A KR1020130048061 A KR 1020130048061A KR 20130048061 A KR20130048061 A KR 20130048061A KR 101442643 B1 KR101442643 B1 KR 101442643B1
- Authority
- KR
- South Korea
- Prior art keywords
- gpu
- cpu
- data
- task
- cache
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/167—Interprocessor communication using a common memory, e.g. mailbox
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
The present invention relates to a collaboration system between a CPU and a graphics processor (GPU) and a method thereof, and more particularly, to a memory structure and a management method for efficiently collaborating between a CPU and a GPU.
Recently, we have adopted ARM cortex multi CPU and nVidia or Imagination SGX multi GPU from AP (Application Processor) such as Samsung Exynos, nVidia Tegra and Texas Instrument OMAP, chip.
Traditionally, in the case of multiple CPUs, primary or secondary caches are shared in order to improve system performance. In addition, a protocol such as MESI (Modified, Exclusive, Shared, Invaild) is adopted for the coherency between caches belonging to each CPU, and Snoop Control Unit (SCU) is installed for this. To minimize access to external memory, write-back, write-once, and write allocate methods are applied.
The GP-GPU, which was first started by Intel and AMD, has been expanded to AP and integrated into one chip as mentioned above. Commonly, they share a lower level cache. However, there is a big difference in the way of memory management in mobile APs and PCs.
For example, in the AMD Fusion APU, the CPU and GPU each have a different page table. ARM Mali T604, on the other hand, manages memory with a page table like Cortext A15. It is not yet validated which is better.
Currently, in a CPU / GPU integrated system, the CPU controls the GPU through a bridge (PC) or a bus (AP). Generally, the GPU mainly delegates the code and data of the tasks to be processed through the CPU through the memory interface, and copies the GPU into the GPU local memory, and the GPU processes the data and copies the result back to the main memory of the CPU . To this end, the software driver of the operating system in the CPU-GPU integrated system controls the GPU through the bridge or the bus interface of the CPU, and the memory sharing and cache controller operate independently of the control structure.
However, because of this, system performance is degraded. Therefore, direct inter-processor communication between CPU and GPU is required, and a control unit for this needs to be added separately. It is also necessary to verify that the CPU and GPU have a separate page table and a common page table in cache sharing.
SUMMARY OF THE INVENTION It is an object of the present invention to provide a system and method for cooperation between a CPU and a GPU that can reduce the load of a CPU by controlling a GPU through a separate control module.
It is another object of the present invention to provide a cache coherence control module that is effective in maintaining cache coherence between a CPU and a GPU by extending a conventional protocol for solving a cache coherency problem between multiprocessors.
The present invention provides a collaborative system between a CPU and a GPU, comprising: a task manager for receiving a task requested by the CPU and requesting the GPU to send a task result processed by the GPU to the CPU; An address mapping unit for mapping the address space of the GPU and the address space of the main memory; A prefetcher that fetches data to be processed after the GPU is processing data from the main memory to the cache memory; And a cache coherency controller for matching the data stored in the cache memory of the CPU with the data stored in the cache memory of the GPU.
According to one aspect of the present invention, the task management unit receives a code information corresponding to a task requested by the CPU and address information of data necessary for performing the task from the CPU, and provides a collaboration system between the CPU and the GPU do.
According to another aspect of the present invention, the task management unit loads a table mapping the address space of the GPU and the address information of data required for the task into the address mapping unit, and provides a collaboration system between the CPU and the GPU.
According to another aspect of the present invention, the task management unit provides a collaboration system between the CPU and the GPU, which distributes the task requested by the CPU to each core of the GPU, and monitors the operation status of each core of the GPU .
According to another aspect of the present invention, the prefetcher receives data required for the GPU from the main memory to the cache memory when the operation signal is received from the operation management unit, and removes the processed data from the cache memory And provides a collaborative system between the CPU and the GPU.
According to another aspect of the present invention, the task management unit checks whether the data stored in the cache memory of the CPU and the data stored in the cache memory of the GPU need to be matched. If the data coincidence is necessary, To provide a cooperative system between the CPU and the GPU.
The method includes receiving a job requested by a CPU and requesting a GPU; Mapping an address space of the GPU to an address space of a main memory; Transferring a result of the GPU-processed operation to the CPU; Identifying data to be processed after the data being processed by the GPU; Fetching the verified data from the main memory to a cache memory; And activating a cache coherence control module to match the data of the CPU with the data of the GPU in a case where it is necessary to match the data of the GPU with the data of the CPU.
According to an aspect of the present invention, the step of receiving a job requested by the CPU and requesting the GPU to the GPU includes receiving code information corresponding to a job and address information of data necessary for a job from the CPU; And distributing the received job to each core of the GPU, and monitoring a work status of each core of the GPU. The present invention provides a method of collaborating between a CPU and a GPU.
According to another aspect of the present invention, mapping an address space of the GPU to an address space of a main memory includes generating a table mapping address space of the GPU and address information of data required for the task; And converting the address of the GPU by referring to the table.
The present invention provides a collaborative system between a CPU and a GPU that is synchronized with a control module that manages GPU operations and which shares only a data area to be delegated to a GPU by a CPU. Thus, the virtual address space used by the CPU can be accessed directly from the cache without copying between memories, which greatly improves performance.
In addition, it is possible to effectively control the prefetch from the main memory to the cache by synchronizing with the operation of the task management module in the shared structure at the cache level, thereby minimizing the direct main memory access of the GPU.
And since the control for the coherency of the cache of the CPU and the GPU can be enabled / disabled by the CPU through the task management module according to the task, it provides a structure for optimizing the problem of the performance degradation due to the snooping.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a structure of a conventional collaboration system between a CPU and a GPU; FIG.
2 is a diagram illustrating a structure of a collaboration system between a CPU and a GPU according to an embodiment of the present invention;
3 is a diagram illustrating a structure of a job manager in a collaboration system between a CPU and a GPU according to an embodiment of the present invention.
4 is a diagram illustrating a structure of an address mapping unit in a cooperative system between a CPU and a GPU according to an embodiment of the present invention;
FIG. 5 illustrates a structure of a pre-fetcher in a collaboration system between a CPU and a GPU according to an embodiment of the present invention; FIG.
6 to 10 are views for explaining a structure of a cache coherency controller in a cooperative system between a CPU and a GPU according to an embodiment of the present invention.
11 is a diagram illustrating a structure of an extended collaboration system between a CPU and a GPU according to an embodiment of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is defined by the scope of the claims.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or "comprising" refer to the presence or absence of one or more other components, steps, operations, and / Or additions. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
2 is a diagram illustrating a structure of a collaboration system between a CPU and a GPU according to an embodiment of the present invention.
The collaboration system between the CPU and the GPU according to an embodiment of the present invention includes a
A Job Manager (CPU / GPU Inter Processor Communication Controller, 200) designates and communicates with each other so that the CPU can directly drive the GPU through a bus or a bridge without driving the GPU.
The
The Re-mapper (Memory Management Unit for GPU) 210 assists in mapping the address space of the GPU to the address space of the main memory used by the CPU.
Existing GPUs do not use the virtual address memory space, but directly access the physical address. Even if the GPU uses a virtual address through a separate MMU, it needs a function to map the address space that the GPU sees to the address space using the page table of the main memory used by the CPU because it is different from the address area used by the CPU. , And this function is handled by the
The pre-fetcher 220 finds data block patterns from the main memory and the L2 cache, receives them as a pattern for reference, and pre-fetches the necessary data.
A
The collaboration process by the collaboration system between the CPU and the GPU according to an embodiment of the present invention proceeds as follows.
The CPU transfers the code and data compiled for the GPU core and the address and offset information of the data segmented by the GPU core to the designated interface of the
The
The
Upon completion of the delegated task, the GPU sends a completion signal to the
3 is a diagram illustrating a structure of a task management unit in a collaboration system between a CPU and a GPU according to an embodiment of the present invention.
An existing CPU delegates tasks to the GPU is a method in which the CPU directly manages the GPU's host request queue through the system bus. Therefore, the CPU is a structure in which the GPU device driver software continuously manages the operation of the GPU through the interrupt interface of the system bus.
On the other hand, the present invention is a device for delegating the management of the GPU-operated tasks through a separate hardware device of the task management unit to improve this. Through the task manager, the CPU can significantly reduce the administrative load associated with the GPU.
The task manager is connected to the same interface as the co-processor instruction of the CPU, and provides the registers that can set the GPU to execute, the memory address, the offset per core, and the parameters. It can also provide the ability to monitor the status and behavior of each core's work on the GPU.
The task manager is designed to extend (up to 4) additional interfaces as well as a single host CPU interface to manage operations with heterogeneous processors such as multi-core processors and collaboration with other GPU hardware. Can be performed.
4 is a diagram illustrating a structure of an address mapping unit in a collaboration system between a CPU and a GPU according to an embodiment of the present invention.
OpenCL and OpenGL models are designed assuming that the CPU-GPU system operates in a non-unified memory structure. In other words, because it has physically separate memory, the virtual memory address space used by the CPU and the memory address space used by the GPU have been developed to be different. However, since the structure of CPU-GPU has recently been developed as a shared memory-based structure on SoC, CPU and GPU have required addressing and conversion on the Unified Shared Memory. A common way to solve this problem is to have the GPU use the same virtual memory address space by referring to the same page table on the main memory through each TLB like a CPU.
Generally, a GPU is entrusted with processing a large amount of data from a CPU, sequentially dividing the data into parallel processing, and returning the result. Considering this point, there is a problem in that a common address mapping table is shared through the TLB for accessing the unified shared memory. The GPU receives a large range of data, and each core that makes up the GPU translates each corresponding space through the TLB.
However, considering the limited TLB size and the fact that the reuse ratio of the conversion information in the TLB is low due to the segmentation and sequential processing characteristics of the GPU, when the data to be processed by the GPU is large, Have no choice but to. Also, when many GPU cores access the memory bus with each TLB, more traffic will be generated, and the implementation complexity will also increase.
To solve this problem, the present invention is designed in the following approach. Since the scope and location of the necessary data is determined before the CPU delegates work to the GPU, the driver through the OpenCL / OpenGL API in the CPU allocates the memory to be passed to the GPU to the contiguous pages as possible, And loads the table mapping the virtual address of the GPU into the address mapping unit. At this time, if the data is fragmented on page basis rather than on consecutive pages, this page information is remapped into consecutive virtual address space for GPU and reflected in the address mapping table.
The address mapping table includes page address information of all data to be passed to the GPU. The GPU refers to the information of the mapping table loaded in the address mapping unit without further memory access for address conversion, and performs address conversion.
The address translation of the address mapping part is performed by referring to the mapping table by the translator device implemented as many as the number of cores of the GPU, and accesses to the shared memory through the cache controller using the converted address information.
5 is a diagram illustrating a structure of a prefetcher in a collaboration system between a CPU and a GPU according to an embodiment of the present invention. The GPU divides the delegated work into parallel and sequential processes, and the present invention designs a prefetcher with a structure as shown in FIG. 5 to manage the tasks more efficiently.
As the GPU starts to work through the task manager, the prefetchers reserve the L2 cache space and the GPU core reserves twice the space required for a single task and divides it into two windows. The first window loads the data needed for the current GPU operation, and the second windows area is reserved for loading data for subsequent processing.
In the reserved window area, L2 cache controller does not apply existing eviction rule, and two windows are dedicated to memory latency hiding of GPU.
6 is a diagram illustrating a structure of a cache coherence controller in a collaboration system between a CPU and a GPU according to an embodiment of the present invention.
The cache coherence controller is responsible for the coherency between the L1 cache of the multicore CPU and the GPU, as well as the memory-to-cache and cache-to-cache data transfers between the cores according to the protocol, and the L2 cache for pre- As shown in FIG.
The cache coherence control unit is designed in a structure for a single-core CPU and a structure for expanding it. The coherency model for sharing on unified memory between the first single-core CPU and the GPU is as shown in FIG.
In FIG. 7, the protocol for state conversion is shown in FIG. The protocol of FIG. 8 is basically based on data transmission between L1 cache. And since the CPU that delegates the task to the GPU is less likely to access the data again during the operation process of the GPU, snooping is minimized for coherency with the invalidation-based GPU. That is, not only ownership of data but also cached data itself is copied. Therefore, only one copy of data shared with the GPU exists in the L1 cache.
However, the architecture for multicore CPUs and GPUs is more complicated because it must work with the coherency protocol between the CPUs. To this end, we extend the Dragon protocol based on MOESI.
Figure 9 shows definitions of the states required for the extended protocol. The state of the RD is added and an invalidation request of INV_REQ is added. The RD state indicates the state in which the GPU loads data into its cache and then proceeds to write data. In addition, a condition is added to distinguish the sharing between the CPU and the sharing with the GPU, which is provided through the address mapping unit described above. The address mapping unit sets condition r to true for data accessed by referring to its own table. The coherency protocol designed using the state defined in FIG. 9 is shown in FIG.
In FIG. 10, the protocol is basically based on invalidation as data shared with the GPU as in the single-core CPU described above. This basically allows the GPU to invalidate the CPU's shared cache lines in order to minimize the update when the CPU wants to share the data for the task delegated to the CPU.
The schematic structure of the cache coherence control unit including such a protocol is as shown in FIG. 6, and the cache coherence control unit is roughly divided into three parts.
The first is a comparator for adjusting the state change of the protocol described above. The comparator receives the address and line status from the L1 cache controller of the GPU and the CPU, and manages the status of these.
The second is a cache-to-cache data transfer unit. This unit is responsible for transferring data between the L1 cache and the comparator when necessary.
The third is the L2 cache controller. The L2 controller manages L2 by applying a normal cache eviction rule, and performs memory transmission necessary for pre-fetching the GPU by partitioning L2 into a required size area when a request is made from the prefetcher described above.
FIG. 11 illustrates a system in which a collaboration system between a CPU and a GPU is expanded according to an embodiment of the present invention. In the collaboration system shown in FIG. 11, two CPUs and a GPU share a memory.
The above-described structure of the collaboration system between the CPU and the GPU can be extended not only to the L2 but also to the shared structure through the L3 cache, and can be extended to a single CPU as well as a collaboration structure between the multiple CPU and the GPU.
Multiple CPUs and GPUs have L2 cache, respectively, and L3 has a shared structure. The task manager operates through the interface with the CPU as in the above-described structure. However, the cache coherence controller must always operate to share memory between the CPUs.
The foregoing description is merely illustrative of the technical idea of the present invention and various changes and modifications may be made without departing from the essential characteristics of the present invention. Therefore, the embodiments described in the present invention are not intended to limit the scope of the present invention, but are intended to be illustrative, and the scope of the present invention is not limited by these embodiments. It is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents, which fall within the scope of the present invention as claimed.
Claims (15)
A task manager for receiving a task requested by the CPU and requesting the GPU to send a task result processed by the GPU to the CPU; And
And an address mapping unit for supporting a mapping between an address space of the GPU and an address space of the main memory,
The task management unit
Receiving code information corresponding to a task requested by the CPU and address information of data necessary for performing the task from the CPU
Collaboration system between CPU and GPU.
Loading a table mapping the address space of the GPU and address information of data required for the task into the address mapping unit
Collaboration system between CPU and GPU.
Connected to the CPU with the same interface as the coprocessor interface
Collaboration system between CPU and GPU.
Distributing a task requested by the CPU to each core of the GPU, and monitoring the operation status of each core of the GPU
Collaboration system between CPU and GPU.
A prefetcher that fetches data to be processed next to data being processed by the GPU from the main memory to the cache memory
And further comprising a CPU and a GPU.
When receiving an operation signal from the task management unit, fetching data necessary for the GPU from the main memory to the cache memory and removing processed data from the cache memory
Collaboration system between CPU and GPU.
A cache coherence controller for matching the data stored in the cache memory of the CPU with the data stored in the cache memory of the GPU;
And further comprising a CPU and a GPU.
Checking whether data stored in the cache memory of the CPU needs to be matched with data stored in the cache memory of the GPU, and operating the cache coherence controller when data coincidence is required
Collaboration system between CPU and GPU.
Mapping an address space of the GPU to an address space of a main memory; And
And transferring the result of the processing performed by the GPU to the CPU,
The step of receiving the job requested by the CPU and requesting to the GPU
And receiving the code information corresponding to the task and the address information of the data necessary for the task from the CPU
A way to collaborate between a CPU and a GPU.
Distributing the received job to each core of the GPU, and monitoring the operation status of each core of the GPU
A way to collaborate between a CPU and a GPU.
Generating a table mapping an address space of the GPU and address information of data necessary for the task; And
And converting the address of the GPU by referring to the table
A way to collaborate between a CPU and a GPU.
Identifying data to be processed after the data being processed by the GPU; And
Fetching the verified data from the main memory to the cache memory
And a method for collaborating between a CPU and a GPU.
Operating the cache coherence control module to match both data if the data of the CPU and the data of the GPU need to be matched
And a method for collaborating between a CPU and a GPU.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130048061A KR101442643B1 (en) | 2013-04-30 | 2013-04-30 | The Cooperation System and the Method between CPU and GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130048061A KR101442643B1 (en) | 2013-04-30 | 2013-04-30 | The Cooperation System and the Method between CPU and GPU |
Publications (1)
Publication Number | Publication Date |
---|---|
KR101442643B1 true KR101442643B1 (en) | 2014-09-19 |
Family
ID=51760683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020130048061A KR101442643B1 (en) | 2013-04-30 | 2013-04-30 | The Cooperation System and the Method between CPU and GPU |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101442643B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018009267A1 (en) * | 2016-07-06 | 2018-01-11 | Intel Corporation | Method and apparatus for shared virtual memory to manage data coherency in a heterogeneous processing system |
CN108459912A (en) * | 2018-04-10 | 2018-08-28 | 郑州云海信息技术有限公司 | A kind of last level cache management method and relevant apparatus |
CN108959165A (en) * | 2018-06-28 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of management system of GPU whole machine cabinet cluster |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040106472A (en) * | 2002-05-08 | 2004-12-17 | 인텔 코오퍼레이션 | Method and system for optimally sharing memory between a host processor and graphic processor |
JP2011175624A (en) * | 2009-12-31 | 2011-09-08 | Intel Corp | Sharing resources between cpu and gpu |
-
2013
- 2013-04-30 KR KR1020130048061A patent/KR101442643B1/en active IP Right Grant
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040106472A (en) * | 2002-05-08 | 2004-12-17 | 인텔 코오퍼레이션 | Method and system for optimally sharing memory between a host processor and graphic processor |
JP2011175624A (en) * | 2009-12-31 | 2011-09-08 | Intel Corp | Sharing resources between cpu and gpu |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018009267A1 (en) * | 2016-07-06 | 2018-01-11 | Intel Corporation | Method and apparatus for shared virtual memory to manage data coherency in a heterogeneous processing system |
US11921635B2 (en) | 2016-07-06 | 2024-03-05 | Intel Corporation | Method and apparatus for shared virtual memory to manage data coherency in a heterogeneous processing system |
CN108459912A (en) * | 2018-04-10 | 2018-08-28 | 郑州云海信息技术有限公司 | A kind of last level cache management method and relevant apparatus |
CN108459912B (en) * | 2018-04-10 | 2021-09-17 | 郑州云海信息技术有限公司 | Last-level cache management method and related device |
CN108959165A (en) * | 2018-06-28 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of management system of GPU whole machine cabinet cluster |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10365930B2 (en) | Instructions for managing a parallel cache hierarchy | |
US7657710B2 (en) | Cache coherence protocol with write-only permission | |
US9513904B2 (en) | Computer processor employing cache memory with per-byte valid bits | |
TWI432963B (en) | Low-cost cache coherency for accelerators | |
US6370622B1 (en) | Method and apparatus for curious and column caching | |
US8176282B2 (en) | Multi-domain management of a cache in a processor system | |
EP3265917B1 (en) | Cache maintenance instruction | |
JP5221565B2 (en) | Snoop filtering using snoop request cache | |
US5692149A (en) | Block replacement method in cache only memory architecture multiprocessor | |
US20070180197A1 (en) | Multiprocessor system that supports both coherent and non-coherent memory accesses | |
US20110078381A1 (en) | Cache Operations and Policies For A Multi-Threaded Client | |
WO2014178450A1 (en) | Collaboration system between cpu and gpu, and method thereof | |
GB2507758A (en) | Cache hierarchy with first and second level instruction and data caches and a third level unified cache | |
US20060080511A1 (en) | Enhanced bus transactions for efficient support of a remote cache directory copy | |
US11789868B2 (en) | Hardware coherence signaling protocol | |
Gerofi et al. | Partially separated page tables for efficient operating system assisted hierarchical memory management on heterogeneous architectures | |
WO2023103767A1 (en) | Homogeneous multi-core-based multi-operating system, communication method, and chip | |
US8332592B2 (en) | Graphics processor with snoop filter | |
KR101442643B1 (en) | The Cooperation System and the Method between CPU and GPU | |
JP2008521114A (en) | Coherent caching of local memory data | |
US8627016B2 (en) | Maintaining data coherence by using data domains | |
Sahuquillo et al. | The split data cache in multiprocessor systems: an initial hit ratio analysis | |
KR101192423B1 (en) | Multicore system and Memory management device for multicore system | |
US20090006806A1 (en) | Local Memory And Main Memory Management In A Data Processing System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20170629 Year of fee payment: 4 |
|
FPAY | Annual fee payment |
Payment date: 20180627 Year of fee payment: 5 |
|
FPAY | Annual fee payment |
Payment date: 20190806 Year of fee payment: 6 |