KR101442643B1 - The Cooperation System and the Method between CPU and GPU - Google Patents

The Cooperation System and the Method between CPU and GPU Download PDF

Info

Publication number
KR101442643B1
KR101442643B1 KR1020130048061A KR20130048061A KR101442643B1 KR 101442643 B1 KR101442643 B1 KR 101442643B1 KR 1020130048061 A KR1020130048061 A KR 1020130048061A KR 20130048061 A KR20130048061 A KR 20130048061A KR 101442643 B1 KR101442643 B1 KR 101442643B1
Authority
KR
South Korea
Prior art keywords
gpu
cpu
data
task
cache
Prior art date
Application number
KR1020130048061A
Other languages
Korean (ko)
Inventor
황태호
김동순
Original Assignee
전자부품연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전자부품연구원 filed Critical 전자부품연구원
Priority to KR1020130048061A priority Critical patent/KR101442643B1/en
Application granted granted Critical
Publication of KR101442643B1 publication Critical patent/KR101442643B1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present invention relating to an effective cooperation structure between a CPU and a GPU provides a cooperation system which enhances cooperation efficiency between a CPU and a GPU and a method thereof by reducing a CPU load through an extra unit controlling a GPU and providing information on data addresses used for work without direct copying of data when the work is allocated. A method for keeping the cache coherency which is adequate to solve the discord in cache of a CPU and a GPU is provided to keep cache coherency between a CPU and a GPU by providing a protocol used to keep cache coherency between existing multi-CPUs for the cache coherency.

Description

[0001] The present invention relates to a cooperative system between a CPU and a GPU,

The present invention relates to a collaboration system between a CPU and a graphics processor (GPU) and a method thereof, and more particularly, to a memory structure and a management method for efficiently collaborating between a CPU and a GPU.

Recently, we have adopted ARM cortex multi CPU and nVidia or Imagination SGX multi GPU from AP (Application Processor) such as Samsung Exynos, nVidia Tegra and Texas Instrument OMAP, chip.

Traditionally, in the case of multiple CPUs, primary or secondary caches are shared in order to improve system performance. In addition, a protocol such as MESI (Modified, Exclusive, Shared, Invaild) is adopted for the coherency between caches belonging to each CPU, and Snoop Control Unit (SCU) is installed for this. To minimize access to external memory, write-back, write-once, and write allocate methods are applied.

The GP-GPU, which was first started by Intel and AMD, has been expanded to AP and integrated into one chip as mentioned above. Commonly, they share a lower level cache. However, there is a big difference in the way of memory management in mobile APs and PCs.

For example, in the AMD Fusion APU, the CPU and GPU each have a different page table. ARM Mali T604, on the other hand, manages memory with a page table like Cortext A15. It is not yet validated which is better.

Currently, in a CPU / GPU integrated system, the CPU controls the GPU through a bridge (PC) or a bus (AP). Generally, the GPU mainly delegates the code and data of the tasks to be processed through the CPU through the memory interface, and copies the GPU into the GPU local memory, and the GPU processes the data and copies the result back to the main memory of the CPU . To this end, the software driver of the operating system in the CPU-GPU integrated system controls the GPU through the bridge or the bus interface of the CPU, and the memory sharing and cache controller operate independently of the control structure.

However, because of this, system performance is degraded. Therefore, direct inter-processor communication between CPU and GPU is required, and a control unit for this needs to be added separately. It is also necessary to verify that the CPU and GPU have a separate page table and a common page table in cache sharing.

SUMMARY OF THE INVENTION It is an object of the present invention to provide a system and method for cooperation between a CPU and a GPU that can reduce the load of a CPU by controlling a GPU through a separate control module.

It is another object of the present invention to provide a cache coherence control module that is effective in maintaining cache coherence between a CPU and a GPU by extending a conventional protocol for solving a cache coherency problem between multiprocessors.

The present invention provides a collaborative system between a CPU and a GPU, comprising: a task manager for receiving a task requested by the CPU and requesting the GPU to send a task result processed by the GPU to the CPU; An address mapping unit for mapping the address space of the GPU and the address space of the main memory; A prefetcher that fetches data to be processed after the GPU is processing data from the main memory to the cache memory; And a cache coherency controller for matching the data stored in the cache memory of the CPU with the data stored in the cache memory of the GPU.

According to one aspect of the present invention, the task management unit receives a code information corresponding to a task requested by the CPU and address information of data necessary for performing the task from the CPU, and provides a collaboration system between the CPU and the GPU do.

According to another aspect of the present invention, the task management unit loads a table mapping the address space of the GPU and the address information of data required for the task into the address mapping unit, and provides a collaboration system between the CPU and the GPU.

According to another aspect of the present invention, the task management unit provides a collaboration system between the CPU and the GPU, which distributes the task requested by the CPU to each core of the GPU, and monitors the operation status of each core of the GPU .

According to another aspect of the present invention, the prefetcher receives data required for the GPU from the main memory to the cache memory when the operation signal is received from the operation management unit, and removes the processed data from the cache memory And provides a collaborative system between the CPU and the GPU.

According to another aspect of the present invention, the task management unit checks whether the data stored in the cache memory of the CPU and the data stored in the cache memory of the GPU need to be matched. If the data coincidence is necessary, To provide a cooperative system between the CPU and the GPU.

The method includes receiving a job requested by a CPU and requesting a GPU; Mapping an address space of the GPU to an address space of a main memory; Transferring a result of the GPU-processed operation to the CPU; Identifying data to be processed after the data being processed by the GPU; Fetching the verified data from the main memory to a cache memory; And activating a cache coherence control module to match the data of the CPU with the data of the GPU in a case where it is necessary to match the data of the GPU with the data of the CPU.

According to an aspect of the present invention, the step of receiving a job requested by the CPU and requesting the GPU to the GPU includes receiving code information corresponding to a job and address information of data necessary for a job from the CPU; And distributing the received job to each core of the GPU, and monitoring a work status of each core of the GPU. The present invention provides a method of collaborating between a CPU and a GPU.

According to another aspect of the present invention, mapping an address space of the GPU to an address space of a main memory includes generating a table mapping address space of the GPU and address information of data required for the task; And converting the address of the GPU by referring to the table.

The present invention provides a collaborative system between a CPU and a GPU that is synchronized with a control module that manages GPU operations and which shares only a data area to be delegated to a GPU by a CPU. Thus, the virtual address space used by the CPU can be accessed directly from the cache without copying between memories, which greatly improves performance.

In addition, it is possible to effectively control the prefetch from the main memory to the cache by synchronizing with the operation of the task management module in the shared structure at the cache level, thereby minimizing the direct main memory access of the GPU.

And since the control for the coherency of the cache of the CPU and the GPU can be enabled / disabled by the CPU through the task management module according to the task, it provides a structure for optimizing the problem of the performance degradation due to the snooping.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a structure of a conventional collaboration system between a CPU and a GPU; FIG.
2 is a diagram illustrating a structure of a collaboration system between a CPU and a GPU according to an embodiment of the present invention;
3 is a diagram illustrating a structure of a job manager in a collaboration system between a CPU and a GPU according to an embodiment of the present invention.
4 is a diagram illustrating a structure of an address mapping unit in a cooperative system between a CPU and a GPU according to an embodiment of the present invention;
FIG. 5 illustrates a structure of a pre-fetcher in a collaboration system between a CPU and a GPU according to an embodiment of the present invention; FIG.
6 to 10 are views for explaining a structure of a cache coherency controller in a cooperative system between a CPU and a GPU according to an embodiment of the present invention.
11 is a diagram illustrating a structure of an extended collaboration system between a CPU and a GPU according to an embodiment of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is defined by the scope of the claims.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or "comprising" refer to the presence or absence of one or more other components, steps, operations, and / Or additions. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

2 is a diagram illustrating a structure of a collaboration system between a CPU and a GPU according to an embodiment of the present invention.

The collaboration system between the CPU and the GPU according to an embodiment of the present invention includes a task manager 200, an address mapping unit 210, A pre-fetcher 220, and a cache coherency controller 230. The pre-fetcher 220 and the cache coherency controller 230 are connected to each other.

A Job Manager (CPU / GPU Inter Processor Communication Controller, 200) designates and communicates with each other so that the CPU can directly drive the GPU through a bus or a bridge without driving the GPU.

The task management unit 200 is closely connected with the CPU through a co-processor interface of the CPU, divides the requests generated by the CPU into a plurality of GPU cores, and informs the CPU of the processing results. Therefore, the task management unit 200 includes an interface for exchanging necessary information from the CPU.

The Re-mapper (Memory Management Unit for GPU) 210 assists in mapping the address space of the GPU to the address space of the main memory used by the CPU.

Existing GPUs do not use the virtual address memory space, but directly access the physical address. Even if the GPU uses a virtual address through a separate MMU, it needs a function to map the address space that the GPU sees to the address space using the page table of the main memory used by the CPU because it is different from the address area used by the CPU. , And this function is handled by the address mapping unit 210. The GPU side accesses the Unified Shared Memory through the address mapping unit 210.

The pre-fetcher 220 finds data block patterns from the main memory and the L2 cache, receives them as a pattern for reference, and pre-fetches the necessary data.

A cache coherency controller 230 controls the CPU and the GPU to share a cache. It is designed to extend the existing SCU (Snoop Control Unit) to maintain the coherency with the GPU as well as the CPU.

The collaboration process by the collaboration system between the CPU and the GPU according to an embodiment of the present invention proceeds as follows.

The CPU transfers the code and data compiled for the GPU core and the address and offset information of the data segmented by the GPU core to the designated interface of the task management unit 200. The task management unit 200 remaps data address information of a given main memory into a GPU address space and loads the data address information into the address mapping unit 210.

The task management unit 200 operates the prefetcher 220 based on the given address information to fetch data from the main memory to the L2 cache in advance and operates the cache coherence controller 230 when control of cache coherency is required in the CPU .

The task manager 200 allocates tasks to each core of the GPU, and while the tasks are being processed in the GPU, the data to be processed next is fetched to the L2 via the prefetcher 220, Flush the cache data to main memory.

Upon completion of the delegated task, the GPU sends a completion signal to the task management unit 200, and the task management unit 200 notifies the CPU that the task is completed.

3 is a diagram illustrating a structure of a task management unit in a collaboration system between a CPU and a GPU according to an embodiment of the present invention.

An existing CPU delegates tasks to the GPU is a method in which the CPU directly manages the GPU's host request queue through the system bus. Therefore, the CPU is a structure in which the GPU device driver software continuously manages the operation of the GPU through the interrupt interface of the system bus.

On the other hand, the present invention is a device for delegating the management of the GPU-operated tasks through a separate hardware device of the task management unit to improve this. Through the task manager, the CPU can significantly reduce the administrative load associated with the GPU.

The task manager is connected to the same interface as the co-processor instruction of the CPU, and provides the registers that can set the GPU to execute, the memory address, the offset per core, and the parameters. It can also provide the ability to monitor the status and behavior of each core's work on the GPU.

The task manager is designed to extend (up to 4) additional interfaces as well as a single host CPU interface to manage operations with heterogeneous processors such as multi-core processors and collaboration with other GPU hardware. Can be performed.

4 is a diagram illustrating a structure of an address mapping unit in a collaboration system between a CPU and a GPU according to an embodiment of the present invention.

OpenCL and OpenGL models are designed assuming that the CPU-GPU system operates in a non-unified memory structure. In other words, because it has physically separate memory, the virtual memory address space used by the CPU and the memory address space used by the GPU have been developed to be different. However, since the structure of CPU-GPU has recently been developed as a shared memory-based structure on SoC, CPU and GPU have required addressing and conversion on the Unified Shared Memory. A common way to solve this problem is to have the GPU use the same virtual memory address space by referring to the same page table on the main memory through each TLB like a CPU.

Generally, a GPU is entrusted with processing a large amount of data from a CPU, sequentially dividing the data into parallel processing, and returning the result. Considering this point, there is a problem in that a common address mapping table is shared through the TLB for accessing the unified shared memory. The GPU receives a large range of data, and each core that makes up the GPU translates each corresponding space through the TLB.

However, considering the limited TLB size and the fact that the reuse ratio of the conversion information in the TLB is low due to the segmentation and sequential processing characteristics of the GPU, when the data to be processed by the GPU is large, Have no choice but to. Also, when many GPU cores access the memory bus with each TLB, more traffic will be generated, and the implementation complexity will also increase.

To solve this problem, the present invention is designed in the following approach. Since the scope and location of the necessary data is determined before the CPU delegates work to the GPU, the driver through the OpenCL / OpenGL API in the CPU allocates the memory to be passed to the GPU to the contiguous pages as possible, And loads the table mapping the virtual address of the GPU into the address mapping unit. At this time, if the data is fragmented on page basis rather than on consecutive pages, this page information is remapped into consecutive virtual address space for GPU and reflected in the address mapping table.

The address mapping table includes page address information of all data to be passed to the GPU. The GPU refers to the information of the mapping table loaded in the address mapping unit without further memory access for address conversion, and performs address conversion.

The address translation of the address mapping part is performed by referring to the mapping table by the translator device implemented as many as the number of cores of the GPU, and accesses to the shared memory through the cache controller using the converted address information.

5 is a diagram illustrating a structure of a prefetcher in a collaboration system between a CPU and a GPU according to an embodiment of the present invention. The GPU divides the delegated work into parallel and sequential processes, and the present invention designs a prefetcher with a structure as shown in FIG. 5 to manage the tasks more efficiently.

As the GPU starts to work through the task manager, the prefetchers reserve the L2 cache space and the GPU core reserves twice the space required for a single task and divides it into two windows. The first window loads the data needed for the current GPU operation, and the second windows area is reserved for loading data for subsequent processing.

In the reserved window area, L2 cache controller does not apply existing eviction rule, and two windows are dedicated to memory latency hiding of GPU.

6 is a diagram illustrating a structure of a cache coherence controller in a collaboration system between a CPU and a GPU according to an embodiment of the present invention.

The cache coherence controller is responsible for the coherency between the L1 cache of the multicore CPU and the GPU, as well as the memory-to-cache and cache-to-cache data transfers between the cores according to the protocol, and the L2 cache for pre- As shown in FIG.

The cache coherence control unit is designed in a structure for a single-core CPU and a structure for expanding it. The coherency model for sharing on unified memory between the first single-core CPU and the GPU is as shown in FIG.

In FIG. 7, the protocol for state conversion is shown in FIG. The protocol of FIG. 8 is basically based on data transmission between L1 cache. And since the CPU that delegates the task to the GPU is less likely to access the data again during the operation process of the GPU, snooping is minimized for coherency with the invalidation-based GPU. That is, not only ownership of data but also cached data itself is copied. Therefore, only one copy of data shared with the GPU exists in the L1 cache.

However, the architecture for multicore CPUs and GPUs is more complicated because it must work with the coherency protocol between the CPUs. To this end, we extend the Dragon protocol based on MOESI.

Figure 9 shows definitions of the states required for the extended protocol. The state of the RD is added and an invalidation request of INV_REQ is added. The RD state indicates the state in which the GPU loads data into its cache and then proceeds to write data. In addition, a condition is added to distinguish the sharing between the CPU and the sharing with the GPU, which is provided through the address mapping unit described above. The address mapping unit sets condition r to true for data accessed by referring to its own table. The coherency protocol designed using the state defined in FIG. 9 is shown in FIG.

In FIG. 10, the protocol is basically based on invalidation as data shared with the GPU as in the single-core CPU described above. This basically allows the GPU to invalidate the CPU's shared cache lines in order to minimize the update when the CPU wants to share the data for the task delegated to the CPU.

The schematic structure of the cache coherence control unit including such a protocol is as shown in FIG. 6, and the cache coherence control unit is roughly divided into three parts.

The first is a comparator for adjusting the state change of the protocol described above. The comparator receives the address and line status from the L1 cache controller of the GPU and the CPU, and manages the status of these.

The second is a cache-to-cache data transfer unit. This unit is responsible for transferring data between the L1 cache and the comparator when necessary.

The third is the L2 cache controller. The L2 controller manages L2 by applying a normal cache eviction rule, and performs memory transmission necessary for pre-fetching the GPU by partitioning L2 into a required size area when a request is made from the prefetcher described above.

FIG. 11 illustrates a system in which a collaboration system between a CPU and a GPU is expanded according to an embodiment of the present invention. In the collaboration system shown in FIG. 11, two CPUs and a GPU share a memory.

The above-described structure of the collaboration system between the CPU and the GPU can be extended not only to the L2 but also to the shared structure through the L3 cache, and can be extended to a single CPU as well as a collaboration structure between the multiple CPU and the GPU.

Multiple CPUs and GPUs have L2 cache, respectively, and L3 has a shared structure. The task manager operates through the interface with the CPU as in the above-described structure. However, the cache coherence controller must always operate to share memory between the CPUs.

The foregoing description is merely illustrative of the technical idea of the present invention and various changes and modifications may be made without departing from the essential characteristics of the present invention. Therefore, the embodiments described in the present invention are not intended to limit the scope of the present invention, but are intended to be illustrative, and the scope of the present invention is not limited by these embodiments. It is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents, which fall within the scope of the present invention as claimed.

Claims (15)

In a collaborative system between a CPU and a GPU,
A task manager for receiving a task requested by the CPU and requesting the GPU to send a task result processed by the GPU to the CPU; And
And an address mapping unit for supporting a mapping between an address space of the GPU and an address space of the main memory,
The task management unit
Receiving code information corresponding to a task requested by the CPU and address information of data necessary for performing the task from the CPU
Collaboration system between CPU and GPU.
delete The apparatus of claim 1, wherein the task management unit
Loading a table mapping the address space of the GPU and address information of data required for the task into the address mapping unit
Collaboration system between CPU and GPU.
The apparatus of claim 1, wherein the task management unit
Connected to the CPU with the same interface as the coprocessor interface
Collaboration system between CPU and GPU.
The apparatus of claim 1, wherein the task management unit
Distributing a task requested by the CPU to each core of the GPU, and monitoring the operation status of each core of the GPU
Collaboration system between CPU and GPU.
The method according to claim 1,
A prefetcher that fetches data to be processed next to data being processed by the GPU from the main memory to the cache memory
And further comprising a CPU and a GPU.
7. The apparatus of claim 6, wherein the prefetcher
When receiving an operation signal from the task management unit, fetching data necessary for the GPU from the main memory to the cache memory and removing processed data from the cache memory
Collaboration system between CPU and GPU.
The method according to claim 1,
A cache coherence controller for matching the data stored in the cache memory of the CPU with the data stored in the cache memory of the GPU;
And further comprising a CPU and a GPU.
The apparatus of claim 8, wherein the task management unit
Checking whether data stored in the cache memory of the CPU needs to be matched with data stored in the cache memory of the GPU, and operating the cache coherence controller when data coincidence is required
Collaboration system between CPU and GPU.
Receiving a job requested by the CPU and requesting the GPU;
Mapping an address space of the GPU to an address space of a main memory; And
And transferring the result of the processing performed by the GPU to the CPU,
The step of receiving the job requested by the CPU and requesting to the GPU
And receiving the code information corresponding to the task and the address information of the data necessary for the task from the CPU
A way to collaborate between a CPU and a GPU.
delete The method as claimed in claim 10, wherein the step of receiving the job requested by the CPU and requesting to the GPU
Distributing the received job to each core of the GPU, and monitoring the operation status of each core of the GPU
A way to collaborate between a CPU and a GPU.
11. The method of claim 10, wherein mapping the address space of the GPU to an address space of a main memory
Generating a table mapping an address space of the GPU and address information of data necessary for the task; And
And converting the address of the GPU by referring to the table
A way to collaborate between a CPU and a GPU.
11. The method of claim 10,
Identifying data to be processed after the data being processed by the GPU; And
Fetching the verified data from the main memory to the cache memory
And a method for collaborating between a CPU and a GPU.
11. The method of claim 10,
Operating the cache coherence control module to match both data if the data of the CPU and the data of the GPU need to be matched
And a method for collaborating between a CPU and a GPU.
KR1020130048061A 2013-04-30 2013-04-30 The Cooperation System and the Method between CPU and GPU KR101442643B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020130048061A KR101442643B1 (en) 2013-04-30 2013-04-30 The Cooperation System and the Method between CPU and GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020130048061A KR101442643B1 (en) 2013-04-30 2013-04-30 The Cooperation System and the Method between CPU and GPU

Publications (1)

Publication Number Publication Date
KR101442643B1 true KR101442643B1 (en) 2014-09-19

Family

ID=51760683

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020130048061A KR101442643B1 (en) 2013-04-30 2013-04-30 The Cooperation System and the Method between CPU and GPU

Country Status (1)

Country Link
KR (1) KR101442643B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018009267A1 (en) * 2016-07-06 2018-01-11 Intel Corporation Method and apparatus for shared virtual memory to manage data coherency in a heterogeneous processing system
CN108459912A (en) * 2018-04-10 2018-08-28 郑州云海信息技术有限公司 A kind of last level cache management method and relevant apparatus
CN108959165A (en) * 2018-06-28 2018-12-07 郑州云海信息技术有限公司 A kind of management system of GPU whole machine cabinet cluster

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040106472A (en) * 2002-05-08 2004-12-17 인텔 코오퍼레이션 Method and system for optimally sharing memory between a host processor and graphic processor
JP2011175624A (en) * 2009-12-31 2011-09-08 Intel Corp Sharing resources between cpu and gpu

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040106472A (en) * 2002-05-08 2004-12-17 인텔 코오퍼레이션 Method and system for optimally sharing memory between a host processor and graphic processor
JP2011175624A (en) * 2009-12-31 2011-09-08 Intel Corp Sharing resources between cpu and gpu

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018009267A1 (en) * 2016-07-06 2018-01-11 Intel Corporation Method and apparatus for shared virtual memory to manage data coherency in a heterogeneous processing system
US11921635B2 (en) 2016-07-06 2024-03-05 Intel Corporation Method and apparatus for shared virtual memory to manage data coherency in a heterogeneous processing system
CN108459912A (en) * 2018-04-10 2018-08-28 郑州云海信息技术有限公司 A kind of last level cache management method and relevant apparatus
CN108459912B (en) * 2018-04-10 2021-09-17 郑州云海信息技术有限公司 Last-level cache management method and related device
CN108959165A (en) * 2018-06-28 2018-12-07 郑州云海信息技术有限公司 A kind of management system of GPU whole machine cabinet cluster

Similar Documents

Publication Publication Date Title
US10365930B2 (en) Instructions for managing a parallel cache hierarchy
US7657710B2 (en) Cache coherence protocol with write-only permission
US9513904B2 (en) Computer processor employing cache memory with per-byte valid bits
TWI432963B (en) Low-cost cache coherency for accelerators
US6370622B1 (en) Method and apparatus for curious and column caching
US8176282B2 (en) Multi-domain management of a cache in a processor system
EP3265917B1 (en) Cache maintenance instruction
JP5221565B2 (en) Snoop filtering using snoop request cache
US5692149A (en) Block replacement method in cache only memory architecture multiprocessor
US20070180197A1 (en) Multiprocessor system that supports both coherent and non-coherent memory accesses
US20110078381A1 (en) Cache Operations and Policies For A Multi-Threaded Client
WO2014178450A1 (en) Collaboration system between cpu and gpu, and method thereof
GB2507758A (en) Cache hierarchy with first and second level instruction and data caches and a third level unified cache
US20060080511A1 (en) Enhanced bus transactions for efficient support of a remote cache directory copy
US11789868B2 (en) Hardware coherence signaling protocol
Gerofi et al. Partially separated page tables for efficient operating system assisted hierarchical memory management on heterogeneous architectures
WO2023103767A1 (en) Homogeneous multi-core-based multi-operating system, communication method, and chip
US8332592B2 (en) Graphics processor with snoop filter
KR101442643B1 (en) The Cooperation System and the Method between CPU and GPU
JP2008521114A (en) Coherent caching of local memory data
US8627016B2 (en) Maintaining data coherence by using data domains
Sahuquillo et al. The split data cache in multiprocessor systems: an initial hit ratio analysis
KR101192423B1 (en) Multicore system and Memory management device for multicore system
US20090006806A1 (en) Local Memory And Main Memory Management In A Data Processing System

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20170629

Year of fee payment: 4

FPAY Annual fee payment

Payment date: 20180627

Year of fee payment: 5

FPAY Annual fee payment

Payment date: 20190806

Year of fee payment: 6