CN104536740B - Shared virtual memory virtual function shared between the heterogeneous processors in a computing platform - Google Patents

Shared virtual memory virtual function shared between the heterogeneous processors in a computing platform Download PDF

Info

Publication number
CN104536740B
CN104536740B CN 201410790536 CN201410790536A CN104536740B CN 104536740 B CN104536740 B CN 104536740B CN 201410790536 CN201410790536 CN 201410790536 CN 201410790536 A CN201410790536 A CN 201410790536A CN 104536740 B CN104536740 B CN 104536740B
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
cpu
gpu
virtual
side
shared
Prior art date
Application number
CN 201410790536
Other languages
Chinese (zh)
Other versions
CN104536740A (en )
Inventor
S.颜
S.罗
X.周
Y.高
H.陈
B.萨哈
Original Assignee
英特尔公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Abstract

一种计算平台可包含异质处理器(例如CPU和GPU)以支持这种处理器之间虚函数的共享。 A computing platform may comprise heterogeneous processor (e.g., CPU and GPU) in support of such a shared virtual functions between processors. 在一个实施例中,用于访问来自CPU 110的共享对象的CPU侧虚函数表指针可用于确定GPU虚函数表,如果存在GPU侧表的话。 In one embodiment, for access to the CPU side virtual function table pointer of the CPU 110 from the shared objects can be used to determine the virtual function table GPU, GPU side if the table exists. 在其它实施例中,可在共享虚拟存储器内创建可不保持数据一致性的共享非相干区域。 In other embodiments, data consistency can not create a shared non-coherent region within the shared virtual memory. 存储在共享非相干区域内的CPU和GPU侧数据可具有与从CPU和GPU侧看到的相同的地址。 CPU and GPU side data stored in the non-coherent shared region may have the same address seen from the CPU and GPU side. 然而,CPU侧数据的内容可不同于GPU侧数据的内容,因为共享虚拟存储器在运行时期间可不保持一致性。 However, the content data may be different from the CPU side GPU side data content, because the shared virtual memory during runtime may not be consistent. 在一个实施例中,vptr可修改成指向存储在共享虚拟存储器中的CPU虚函数表和GPU虚函数表。 In one embodiment, vptr CPU can be modified to point to store virtual function tables and virtual function table in the GPU shared virtual memory.

Description

计算平台的异质处理器之间的共享虚拟存储器中的虚函数共享 Shared virtual memory virtual function shared between the heterogeneous processors in a computing platform

背景技术 Background technique

[0001]计算平台可包含异质处理器,诸如中央处理单元(CPU)和图形处理单元(GPU)、对称和不对称处理器。 [0001] heterogeneous computing platform may comprise a processor, such as a central processing unit (CPU) and a graphics processing unit (the GPU), symmetric and asymmetric processors. 类实例(或对象)可驻留在与CPU-GPU平台的第一侧(例如CPU)相关联的第一存储器中。 Class instance (or object) may reside in the CPU-GPU platform first side (e.g., CPU) in a first associated memory. 不可能使第二侧(GPU働调用驻留在与CPU-GPU平台的第一侧(CPU働相关联的第一存储器中的对象和相关联的成员函数。也不可能使第一侧能够调用驻留在第二侧(GPU)侧上第二存储器中的对象和相关联的成员函数。当类实例或对象被存储在不同地址空间时,现有通信机制可能仅允许异质处理器(CPU与GPU)之间的单向通信来调用类实例和相关联的虚函数。 Possible second side (GPU Dong call reside in the first side of the first memory (CPU Dong associated CPU-GPU platform objects and member functions associated first side can not be able to call reside on the second side of the second side of the objects in memory and the associated member functions (GPU). when the object or class instance be in different address spaces, the existing communication mechanism may only allow hetero storage processing unit (CPU one-way communication between the GPU) to call the class instances and associated virtual function.

[0002]这种单向通信方法防止了异质处理器之间的类实例的自然功能性划分。 [0002] Such one-way communication method of preventing natural division of functionality between heterogeneous class instance processor. 对象可包括面向吞吐量的成员函数和一些标量成员函数。 Facing the object may include a certain number of member functions and member functions scalar. 例如,游戏应用中的场景类可具有可适合于GPU的渲染函数,并且还可包括可适合于在CPU上执行的物理学和人工智能(AI)函数。 For example, a scene in game application may have a class may be adapted to GPU rendering function, and may also be adapted to include a physics and AI executing on the CPU (AI) function. 用当前的单向通信机制,通常存在两个不同的场景类,分别包括CPU (上例中的物理学和AI) 成员函数和GPU (适合GPU的渲染函数)成员函数。 One-way communication with the current mechanisms, generally there are two distinct classes of scenarios, each includes a CPU (physics and AI in the example above) member function and GPU (GPU for rendering function) member function. 使两个不同的场景类一个用于CPU而另一个用于GHJ可能需要在两个场景类之间来回拷贝数据。 The two different scenarios for a class and the other for GHJ CPU may copy data back and forth between two scenes classes.

附图说明 BRIEF DESCRIPTION

[0003] 在附图中作为示例而非作为限制说明了本文描述的本发明。 [0003] By way of example in the drawings and not described as a limitation of the present invention described herein. 为了说明的简化和清晰,图中示出的单元不一定按比例绘制。 For simplicity and clarity of illustration, elements shown in FIG figures are not necessarily drawn to scale. 例如,为了清楚起见,一些单元的尺寸可能相对于其它单元放大了。 For example, for clarity, the dimensions of some elements may be exaggerated relative to other elements. 另外,在认为适当的情况下,附图标记己经在各图之间重复使用来指示对应或类似的单元。 Further, in the case where deemed appropriate, reference numerals have repeated to indicate corresponding or like elements among the figures.

[0004] 图1示出了根据一个实施例支持在计算机平台中提供的异质处理器之间的共享虚拟存储器中存储的虚函数共享的平台100; [0004] FIG. 1 shows a shared virtual memory between the heterogeneous processors provide support for one embodiment of a computer platform virtual function stored in a shared platform 100;

[0005]图2是不出了根据一个实施例由平台100执行的支持在计算机平台中提供的异质处理器之间的共享虚拟存储器中存储的虚函数共享的操作的流程图; [0005] FIG. 2 is a flow chart not in accordance with the shared virtual memory virtual function shared between the heterogeneous processors provided in the embodiment of a computer platform supported by the platform 100 executing stored in the operation;

[0006]图3不出了根据一个实施例用于从共享对象加载虚函数指针的CPU侧和GPU侧代码; [0006] FIG. 3 not in accordance with a CPU and GPU side to side loading embodiment codes virtual function pointer from the shared objects embodiment;

[0007] 图4是示出了根据第一实施例由平台100执行的生成表以支持在计算机平台中提供的异质处理器之间的共享虚拟存储器中存储的虚函数共享的操作的流程图; [0007] FIG 4 is a flowchart illustrating a table generated by the execution platform 100 to support a virtual memory shared among the processors provided in the heterogeneous computer platform stored in the shared virtual function operation according to a first embodiment ;

[0008] 图5示出了根据一个实施例由平台100用于支持通过异质处理器可共享的对象的成员函数进行CPU 110与GPU 180之间双向通信的流程图; [0008] FIG. 5 shows an embodiment in accordance with the embodiment of platform 100 for supporting the processor member functions can be shared by heterogeneous objects flowchart of bidirectional communications between the CPU 110 and the GPU 180 for;

[0009]图6示出了根据第一实施例实际地描绘由CPU侧进行的GPU虚函数和GPU函数调用的处理的流程图; [0009] FIG. 6 shows a flowchart depicting the actual virtual function GPU and GPU processing functions performed by the CPU of the called side of a first embodiment;

[0010] 图7是示出了根据实施例由平台100执行的使用虚拟共享非相干区域支持异质处理器之间的虚函数共享的操作的流程图; [0010] FIG. 7 is a flowchart illustrating a platform 100 is performed using a non-coherent shared virtual area in accordance with embodiment supports virtual function between heterogeneous processors share operations;

[0011] 图8是示出了根据实施例使用虚拟共享非相干区域支持异质处理器之间的虚函数共孚的关系图; [0011] FIG. 8 is a diagram illustrating a relationship between the use of a shared virtual area to support non-coherent virtual function between heterogeneous co-processor according to Fu embodiment;

[0012] 图9示出了根据一个实施例可提供对在计算机平台中提供的异质处理器之间的共享虚拟存储器中存储的虚函数共享的支持的计算机系统。 [0012] FIG 9 illustrates a computer system of an embodiment may provide a virtual memory shared between virtual functions provided by the processor in the heterogeneous computer platform stored in the shared support according.

具体实施方式 Detailed ways

[0013] 以下描述描述了在计算平台的异质处理器之间的共享虚拟存储器中存储的虚函数共享的技术。 [0013] The following description describes techniques shared virtual function stored in a shared virtual memory between the heterogeneous processors in a computing platform. 在以下描述中,阐述了许多特定细节,诸如逻辑实现、资源划分或共享或系统组件的副本实现、类型和相互关系以及逻辑划分或综合选择,以便提供本发明的更透彻理解。 In the following description, numerous specific details are set forth such as logic implementations, resource partitioning, or sharing a copy of the system or component implementations, types and interrelationships of synthesis and logic partitioning or selected to provide a more thorough understanding of the present invention. 然而,本领域技术人员将认识到,没有此类特定细节也可实施本发明。 However, those skilled in the art will recognize that, without such specific details of the present invention may be practiced. 在其它实例中,未详细示出控制结构、门级电路和完整软件指令序列,以免使本发明模糊。 In other instances, not shown in detail, control structures, gate level circuits and full software instruction sequences, so as not to obscure the present invention. 本领域技术人员用所包含的描述将能够实现适当的功能性,无需过多实验。 Those skilled in the art with the included descriptions, will be able to implement appropriate functionality without undue experimentation.

[0014] 在说明书中提到“一个实施例”、“实施例”、“示例实施例”指示所描述的实施例可包含具体特征、结构或特性,但每个实施例可能不一定都包含该具体特征、结构或特性。 [0014] The references to "one embodiment", "an embodiment", "an example embodiment", indicate that the embodiment described may include a particular feature, structure, or characteristic in the specification, but every embodiment may not necessarily include the a particular feature, structure, or characteristic. 此夕卜,这种短语不一定是指同一实施例。 This evening Bu, such phrases are not necessarily referring to the same embodiment. 另外,当结合一个实施例描述具体特征、结构或特性时,认为它在本领域技术人员的知识范围内,以结合其它实施例影响这种特征、结构或特性,不管是否明确描述了。 Further, when the embodiment described in connection with one particular feature, structure, or characteristic that it is within the purview of one skilled in the art, in conjunction with other embodiments to affect such feature, structure, or characteristic, whether or not explicitly described.

[0015] 可用硬件、固件、软件或它们的任何组合来实现本发明的实施例。 [0015] hardware, firmware, software, or any combination thereof to implement the embodiments of the present invention. 本发明的实施例还可实现为存储在机器可读介质上的指令,它们可由一个或多个处理器读取和执行。 Embodiments of the invention may also be implemented as instructions stored on a machine-readable medium, which may be one or more processors read and executed. 机器可读存储介质可包含用于存储或传输采用机器(例如计算装置)可读形式的信息的任何机构。 The machine-readable medium may comprise storage for storing or transmitting using a machine (e.g., computing devices) means any information in a form readable.

[0016] 例如,机器可读存储介质可包含只读存储器(ROM);随机存取存储器(RAM);磁盘存储介质;光存储介质;闪存装置;电、光形式的信号。 [0016] For example, a machine-readable storage medium may include read only memory (a ROM); random access memory (the RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical signal form. 另外,固件、软件、例程和指令在本文可描述为执行某些动作。 Further, firmware, software, routines, and instructions may be described herein as performing certain actions. 然而,应该认识到,这种描述只是出于方便,并且这种动作实际上源于计算装置、处理器、控制器和执行固件、软件、例程和指令的其它装置。 However, it should be appreciated that such descriptions are merely for convenience and that such actions in fact result from computing devices, processors, controllers, and other devices executing the firmware, software, routines, and instructions.

[0017] 在一个实施例中,计算平台可支持一种或多种技术以允许通过细粒度划分共孚对象来通过诸如共享对象的虚函数的成员函数进行异质处理器(例如CPU与GPU)之间的双向通信(函数调用)。 [0017] In one embodiment, a computing platform may support one or more fine-grained partitioning techniques to allow co-Fu object member functions such as a shared object virtual function hetero processor (e.g., CPU and GPU) two-way communication between (function call). 在一个实施例中,计算平台可允许使用称为“基于表”的技术的第一技术进行CPU与GPU之间的双向通信。 In one embodiment, it referred to as a computing platform may allow the use of two-way communication between the CPU and the GPU "based on the table" the first technology art. 在其它实施例中,计算平台可允许使用称为“非相千区±或”技术的第二技术进行CPU与GPU之间的双向通信,在该技术中,可在虚拟共享存储器中创建虚拟共享非相干区域。 In other embodiments, the computing platform may allow the use referred to as "non-phase region of one thousand or ±" second technique technique bidirectional communication between the CPU and the GPU,, can create a virtual shared virtual shared memory in the art incoherent area.

[0018] 在一个实施例中,当使用基于表的技术时,可用于访问来自CPU或GPU侧的共享对象的共享对象的CPU侧虚函数表(vtable)指针可用于确定GPU虚函数表,如果存在GPU侧表的话。 [0018] In one embodiment, when the technique based on the table to use, can be used for CPU-side virtual function table (the vtable) shared objects accessing the shared object from the CPU or GPU side of the pointer can be used to determine GPU virtual function table, if if present GPU side table. 在一个实施例中,GPU侧虚函数表可包含〈〃类名〃、cro虚函数表地址、GPU虚函数表地址〉。 In one embodiment, GPU side virtual function table may contain <class name 〃 〃, cro virtual function table address, GPU virtual function table address>. 在一个实施例中,下面将更详细地描述获得侧虚函数表地址和生成GPU侧表的技术。 In one embodiment, described more technical side and the virtual function table address generation GPU side table obtained as described in detail below.

[0019] 在其它实施例中,当使用“非相干区域”技术时,在共享虚拟存储器内创建共享非相干区域。 [0019] In other embodiments, when a "non-interference area" technology, to create a shared virtual area within the shared non-coherent memory. 在一个实施例中,共享非相干区域可以不保持数据一致性。 In one embodiment, the shared region may not be incoherent data consistency. 在一个实施例中,共享非相千区域内的CPU侧数据和GPU侧数据可具有与从CPU侧和GPU侧看到的相同的地址。 In one embodiment, the non-shared side with one thousand data within the CPU and the GPU side data region may have the same address seen from the CPU side and the GPU side. 然而,CPU侧数据的内容可不同于GPU侧数据的内容,因为共享虚拟存储器在运行时期间可以不保持一致性。 However, the content data may be different from the CPU side GPU side data content, because the shared virtual memory during runtime may not be consistent. 在一个实施例中,共享非相干区域可用于存储每个共享类的虚拟方法表的新拷贝。 In one embodiment, the non-interference area can be used to share a new copy of the virtual method table for each shared storage class. 在一个实施例中,这种方法可将虚拟表保持在同一地址。 In one embodiment, this method may be maintained at the same virtual address table.

[0020]图1中示出了提供可在异质处理器、诸如CPU与GPU之间共享的虚拟共享存储器中的虚函数的计算平台100的实施例。 In [0020] FIG. 1 shows an embodiment of a computing platform in a heterogeneous processor may be shared between the CPU and GPU, such as virtual shared memory 100, virtual function. 在一个实施例中,平台100可包括中央处理单元(CPU) 110、与0?1)110相关联的操作系统(〇5)112、0?1]私有空间115、0?11编译器118、共享虚拟存储器(或多版本共享存储器)130、图形处理单元(GPU) 180、与GPU 180相关联的操作系统(OS) 182、GPU私有空间1邪和GPU编译器188。 In one embodiment, platform 100 may include a central processing unit (the CPU) 110, and 0? 1 OS) 110 associated (〇5) 112,0? 1] 115,0 private space? 11 compilers 118, shared virtual memory (or shared memory versions) 130, a graphics processing unit (GPU) 180 [, with the operating system associated with the GPU 180 [(OS) 182, GPU private space and an evil GPU compiler 188. 在一个实施例中,OS 112和0S 182可分别管理CPU 110和CPU私有空间115,以及GPU 180和GPU私有空间185的资源。 In one embodiment, OS 112 and 0S 182 respectively manage private space CPU 110 and CPU 115 and GPU 180 [185 and GPU resources of private space. 在一个实施例中,为了支持共享虚拟存储器130,CPU私有空间115和GPU私有空间185可包括多版本数据的拷贝。 In one embodiment, in order to support virtual shared memory 130, CPU 115 and GPU private space private space 185 may comprise a multi-copy version of the data. 在一个实施例中,为了保持存储器一致性,诸如对象131的元数据可用于同步存储在CPU私有空间115和GPU私有空间185中的拷贝。 In one embodiment, in order to maintain memory coherency, such as meta data 131 for copying of the object 185 can be stored in the synchronization CPU 115 and GPU private space private space. 在其它实施例中,多版本数据可存储在物理共享存储器诸如共享存储器950 (图9的,下面描述)中。 In other embodiments, multiple versions of the shared data may be stored in a physical memory such as a shared memory 950 (FIG. 9, described below). 在一个实施例中,共享虚拟存储器可由物理私有存储器空间(诸如异质处理器CPU 110和GPU 180的CPU私有空间115和GPU私有空间185)或物理共享存储器(诸如由异质处理器共享的共享存储器950)支持。 In one embodiment, the physical memory can be shared virtual private memory space (such as a heterogeneous processor 185 and the CPU 110 the CPU private space 115 GPU 180 and GPU private space) or a physical shared memory (shared, such as shared by a heterogeneous processor The memory 950) support.

[0021] 在一个实施例中,CPU编译器118和GPU编译器188可分别耦合到CPU 110和GPU 180,或者也可远程提供在其它平台或计算机系统上。 [0021] In one embodiment embodiment, CPU 118 and GPU compiler compiler 188 may be coupled to CPU 110 and GPU 180, or may be provided on a remote computer system or other platforms. 与CPU 110相关联的编译器(一个或多个)118可生成用于CPU 110的编译代码,并且与GPU 180相关联的编译器188可生成用于GPU 180的编译代码。 118 CPU 110 and associated compiler (s) may be used to generate compiled code of the CPU 110, GPU 180 and the associated compiler 188 may generate a compiled code of the GPU 180. 在一个实施例中,CPU编译器118和GPU编译器188可通过编译用户用高级语言诸如面向对象的语言提供的对象的一个或多个成员函数来生成编译代码。 In one embodiment embodiment, CPU 118 and GPU compiler compiler 188 can compile the code to generate a member function or a plurality of object-oriented languages, such as object a high level language compiler by the user. 在一个实施例中,编译器11S和188可使对象存储在共享存储器130中,并且共享对象131可包括分配给CHJ侧110或侧180的成员函数。 In one embodiment, compiler 188 allows the 11S and objects stored in the shared memory 130, and the shared object 131 may include a member function assigned to CHJ side 110 or side 180. 在一个实施例中,存储在共享存储器130中的共享对象131可包括成员函数诸如虚函数VF 133-A到133-K和非虚函数NVF 136-A到136-L。 In one embodiment, the shared objects 131 stored in the shared memory 130 may include a member function, such as a virtual function VF 133-A to 133-K and non-virtual functions NVF 136-A to 136-L. 在一个实施例中,成员函数诸如共享对象131的VF 133和NVF 136可提供CPU 110与GRJ 180 之间的双向通信。 In one embodiment, the member functions such as a shared VF 133 and the object 136 NVF 131 may provide bidirectional communication between the 180 CPU 110 GRJ.

[0022] 在一个实施例中,为了实现动态绑定目的,CPU 110或GPU 180之一可通过索引虚函数表(虚函数表)来调用虚函数,诸如VF 133-A (例如C++虚函数)。 [0022] In one embodiment, in order to achieve the purpose of dynamic binding, one CPU 110 or the GPU 180 may be indexed by the virtual function call virtual function table (vtable), such as VF 133-A (e.g. C ++ virtual functions) . 在一个实施例中,共享对象131中的隐藏指针可指向该虚函数表。 In one embodiment, the shared objects hidden pointer 131 may point in the virtual function table. 然而,CPU 110和GPU 180可具有不同的指令集架构(ISA),并且当对于具有不同ISA的CPU 110和GPU 180编译函数时,表示由编译器118 和188编译的相同函数的代码可具有不同的大小。 However, CPU 110 and GPU 180 may have different instruction set architecture (ISA), and when for a CPU 110 having a different ISA and GPU 180 compiled functions, represented by a code compiler 118, and the same functions 188 to compile may have different the size of. 以同样方式将代码布局在GPU侧和CPU侧(即,共享类中的虚函数的CPU版本和共享类中的同一虚函数的GPU版本),可能具有挑战性。 In the same way the code layout GPU and CPU-side side (i.e., versions of the same virtual function GPU CPU versions shared classes and shared class virtual function in), can be challenging. 如果在共享类Foo()中存在三个虚函数,则在代码的CPU版本中,函数可位于地址A1、 A2和A3。 If the presence of three virtual function () in the shared class Foo, the version of the code in the CPU, the function can be located at the address A1, A2 and A3. 然而,在代码的GPU版本中,函数可位于地址B1、B2和B3,它们可不同于A1、A2和A3。 However, the GPU version of the code, the function at address B1, B2 and B3, which may differ from A1, A2 and A3. 共享类中同一函数的CPU侧和GPU侧代码的这种不同地址位置可暗示共享对象(即共享类的实例)可能需要2个虚函数表(第一虚函数表和第二虚函数表)。 Shared class such a different address location of the CPU and GPU side side codes the same function may imply a shared object (i.e., shared class instance) may require two virtual function table (vtable first and second virtual function table). 第一虚函数表可包含函数的CPU侧版本的地址(A1、A2和A3),并且当在CPU侧可使用对象(或调用CPU侧函数)时,可使用第一虚函数表。 Virtual function table may contain a first side version of the function of the CPU address (A1, A2 and A3), and when the CPU can use the object side (or the function call side CPU), a first virtual function table may be used. 第二虚函数表可包含函数的GPU侧版本的地址(B1、B2和B3),并且当在GPU侧可使用对象(或调用GPU侧函数)时,可使用第二虚函数表。 The second function may comprise a virtual function table version GPU side address (B1, B2 and B3), and may be used when an object side GPU (or function calls GPU side), using a second virtual function table.

[0023]在一个实施例中,可通过将第一和第二虚函数表与共享对象131相关联来实现在CPU 110与GPU 180之间的共享虚拟存储器中存储的虚函数共享。 [0023] In one embodiment, by the first and second virtual function table 131 associated with the shared object is achieved between the CPU 110 and the GPU 180 is stored in the shared virtual memory shared virtual function. 在一个实施例中,可通过将共享对象1:31的第一与第二虚函数表相关联来创建公用虚函数表,其可用于CPU侧和GPU 侧上的虚函数调用。 In one embodiment, the first and second objects may be shared virtual function tables associated with the virtual function table to create a public 1:31, which may be used for virtual function on the CPU side and the calling side by the GPU.

[0024]在图2的流程图中描绘了异质处理器CPU 110和GPU 180共享存储在共享虚拟存储器中的虚函数的实施例。 [0024] In the depicted embodiment, the shared virtual memory virtual function heterogeneous processor GPU 180 and CPU 110 in the shared memory in the flowchart of FIG. 在块210,第一处理器诸如CPU 110可标识共享对象131的第一处理器侧虚函数表指针(CPU侧虚函数表指针)。 At block 210, a first processor, such as CPU 110 may identify a first processor side shared virtual function table pointer of the object 131 (CPU side of the virtual function table pointer). 在一个实施例中,对于共享对象131,可存在CHJ侧虚函数表指针,不管共享对象131可由CPU侧还是GPU侧访问。 In one embodiment, for shared objects 131, there may be side CHJ virtual function table pointer, regardless of the shared object side by the CPU 131 or GPU side access.

[0025]在一个实施例中,对于计算系统诸如仅有CPU的环境中的正常虚函数调用,代码序列可如图3的块310中所示。 [0025] In one embodiment, the computing system for the environment, such as a CPU only normal virtual function call, code sequence block 310 may be as shown in FIG. 3. 在一个实施例中,甚至在计算系统诸如100中,其可包含异质处理器,用于正常虚函数调用的CPU侧代码序列可与在图3的块310中描绘的相同。 In one embodiment, even in such a computing system 100, which may comprise heterogeneous processor for normal CPU side virtual function call code sequence may be the same as depicted in block 310 of FIG. 3. 如在块310 中所描绘的,行301中的代码:Mov rl,[obj]可将共享对象131的虚函数表加载到变量rl。 As depicted in block 310, line 301 code: Mov rl, [obj] may be shared object virtual function table 131 is loaded into the variable rl. 行305中的代码:(Call*[rl+offsetFunction])可调用虚函数,诸如共享对象131的VF 133-A〇 Code row 305: (Call * [rl + offsetFunction]) can be called a virtual function, such as a shared VF 133-object 131 A〇

[0026]在块250中,第二处理器诸如GPU 180可使用共享对象131的第一处理器侧虚函数表指针(CPU侧虚函数表指针)确定第二处理器侧虚函数表(GPU侧虚函数表),如果存在第二处理器侧表(GPU表)的话。 [0026] At block 250, the second processor such as GPU 180 may use a first side of the virtual function table pointer processor (CPU side virtual function table pointer) 131 of the shared object to determine a second side of the virtual function table processor (GPU side virtual function table), if there is a second processor side table (GPU table) words. 在一个实施例中,第二处理器侧表(GPU表)可包含〈〃类名〃、第一处理器侧虚函数表地址、第二处理器侧虚函数表地址〉。 In one embodiment, the second processor side table (GPU table) may include <class name 〃 〃, a first processor side virtual function table address, the second processor side virtual function table address>.

[0027] 在一个实施例中,在GPU侧,GPU 180可生成在块350中描绘的代码序列,其可不同于在块310中描绘的代码序列。 [0027] In one embodiment, the side of the GPU, GPU 180 [generate code sequence depicted in block 350, which may be different from the code sequence depicted in block 310. 在一个实施例中,因为GPU编译器188可根据类型知道每个可共享类,因此GPU 180可生成在块350中描绘的代码序列,用于从共享对象诸如共享对象131加载虚函数指针。 In one embodiment, because the GPU compiler 188 according to the type known per shareable class, and therefore 180 [GPU may generate a code sequence depicted in block 350, a shared object from the shared objects 131, such as a virtual function pointer loaded. 在一个实施例中,行351中的代码:Mov rl,[obj]可加载CPU虚函数表地址,并且行353中的代码:R2 = getvtableAddress (rl)可从GPU表中获得GPU虚函数表。 In one embodiment, row 351 code: Mov rl, [obj] loadable CPU virtual function table address, and the row code 353: R2 = getvtableAddress (rl) available GPU virtual function table from the GPU table. 在一个实施例中,行358中的代码:(Cal 1* [r2+off setFunction])可基于使用CPU虚函数表地址生成的GPU虚函数表调用虚函数。 In one embodiment, the line code 358: (Cal 1 * [r2 + off setFunction]) can be called a virtual function table based on the GPU using virtual function CPU virtual function table address generation. 在一个实施例中,getvtableAddress函数可使用CPU侧虚函数表地址索引到GPU表中来确定GPU侧虚函数表。 In one embodiment, getvtableAddress function may use the virtual function table-side CPU to the GPU address index table to determine the virtual function table GPU side.

[0028]在块280,使用共享对象1:31可实现第一处理器(CPU 110)和第二处理器(GRJ 180) 进行双向通信。 [0028] At block 280, may be implemented using shared objects 1:31 the first processor (CPU 110) and a second processor (GRJ 180) for two-way communication.

[0029]使用图4的流程图示出了创建GPU表的实施例。 [0029] FIG 4 illustrates a flow embodiment creates table GPU. 在块410,可在初始化时间期间,在一个实施例中,通过将指向可共孚类(共孚对象131)的注册函数(registration function) 的函数指针包含到初始化段(例如MSC++的CRT$XCI段)中,形成该表。 At block 410, may be used during initialization time, in one embodiment, by pointing be co-Fu (co Fu object 131) a registration function (registration function) function pointer contains an initialization section (e.g. MSC ++ the CRT $ XCI section), forming the table. 例如,可共享类的注册函数可被包含到MS CRT$XCI段初始化段中。 For example, a shareable class registration function may be incorporated into the MS CRT $ XCI section initialization section.

[0030]在块420,可在初始化时间期间执行注册函数。 [0030] At block 420, the registration function may be performed during the initialization time. 作为将指向注册函数的函数指针包含到初始化段中的结果,可在执行初始化段的同时执行注册函数。 As a result of the function pointer points to register the function included in the initialization section, the register function may be performed while performing the initialization segment.

[0031] 在块430,在第一处理器侧(CPU侧),注册函数可将“类名”和“CPU虚函数表地址,' 注册到第一表中。在块440,在第二处理器侧(GPU侧),注册函数可将“类名”和“GPU虚函数表地址”注册到第二表中。 [0031] At block 430, a first side processor (CPU side), the register function "Class Name" and "CPU virtual function table address," registered in the first table. At block 440, in the second process side (GPU side), the register function "class name" and "GPU virtual function table address" registered in the second table.

[0032]在块480,第一表和第二表可合并成一个公用表。 [0032] At block 480, the first and second tables can be combined into a common table. 例如,如果第一和第二表包含相同“类名”,则第一表的第一项可与第二表的第一项组合。 For example, if the first and second table comprise the same "class name", the first item may be a first combination of the first table and the second table. 作为合并的结果,第一和第二表的组合项可看起来像具有单个类名的一项。 As a result of the merger, the first and second combination of the items may look like a table with a single class name. 在一个实施例中,公用表可驻留在GPU侧,并且公用表或GPU表可包含“类名”、CPU虚函数表地址和GPU虚函数表地址。 In one embodiment, the common side of the table may reside on the GPU, and the GPU common table or table may contain "class name", the CPU and the GPU address vtable the virtual function table address.

[0033] 在一个实施例中,创建公用表或GPU表可避免对CPU侧和GPU侧上虚函数表地址匹配的要求。 [0033] In one embodiment, the common table or create a table avoids requirements for GPU virtual function table address on the CPU side and the side matching the GPU. 还有,GTO表可支持动态链接库(DLL)。 There, GTO table can support dynamic link library (DLL). 在一个实施例中,在GPU侧上可使用或初始化共享对象131之前,可在CPU侧上加载类。 In one embodiment, prior to, or may be used to initialize shared objects 131 on the side of the GPU, the class may be loaded on the CPU side. 然而,因为应用一般被加载在CPU侧,因此对于在应用中定义的类还有静态链接库,GPU表可实现CPU 110与GPU 180之间的双向通信。 However, since application is generally loaded in the CPU side, the two-way communication between the CPU 110 and GPU 180 may be implemented for applications in the class also defined statically linked library, the GPU table. 对于DLL,DLL可加载在CPU侧上,并且GPU表也可用于DLL的双向通信。 For DLL, the DLL can be loaded on the CPU side and GPU forms can also be used for bidirectional communication DLL.

[0034] 可共享对象131可包含cro侧虚函数表指针,并且可没有额外的虚函数表指针用于GPU侧虚函数表。 [0034] 131 may comprise a shareable object side cro virtual function table pointer, and may be without additional virtual function table pointer to the virtual function table GPU side. 在一个实施例中,使用对象中的CPU虚函数表指针,可生成GPU虚函数表指针,如上面在图4的块350中所描述的。 In one embodiment, the object used in the virtual function table pointer CPU, GPU may generate a virtual function table pointer, as described above at block 350 of FIG. 4 as described. 在一个实施例中,可原样使用CPU侧上的CPU虚函数表指针,而GPU侧上的GPU虚函数表指针可用于虚函数调用。 In one embodiment, the CPU may be used as virtual function table pointer on the CPU side, and the virtual function table pointer GPU on the side of the GPU may be used for virtual function call. 在一个实施例中,这种方法可不涉及链接器/加载器的修改或参与,并且不要求共享对象131中的额外vptr指针字段。 In one embodiment, this method may not involve the linker / loader to modify or participate in, and does not require additional vptr shared object pointer field 131. 这种方法可允许在CPU 110与GPU 180之间用面向对象的语言写的应用的细粒度划分。 This approach allows fine-grained application partitioning between CPU 110 and GPU 180 with write object-oriented language.

[0035] 图5中示出了计算平台100用于支持通过可由异质处理器共享的对象的成员函数进行CPU 110与GPU 180之间双向通信的流程图的实施例。 [0035] FIG. 5 shows an embodiment of a computing platform 110 via support member 100 may function heterogeneous processor CPU and shared objects flowchart bidirectional communication between the 180 GPU. 在一个实施例中,GPU编译器188可生成用于GPU函数的CPU存根(stub) 510和CPU侧110上的CPU远程调用API 520。 In one embodiment, GPU 188 may generate the compiler CPU stub (Stub) for GPU remote call function API 520 on CPU 510 and CPU 110 side. 还有, GPU编译器188可生成GPU侧粘合逻辑(gluing logic) 530用于第一成员函数的GPU侧180上的GPU函数。 There, GPU 188 may generate the compiler logic GPU side adhesive (gluing logic) 530 is used on the GPU function GPU side 180 of the first member function. 在一个实施例中,CPU 110可使用第一路径的第一启用路径(包括存根逻辑510、API 520和粘合逻辑530)对第一成员函数进行调用。 In one embodiment, CPU 110 may be called a first member function to enable the use of a first path of the first path (including the stub logic 510, API 520 and glue logic 530). 在一个实施例中,第一启用路径可允许CPU 110与GPU侧180建立远程调用,并将信息从CPU侧110传送到GPU侧180。 In one embodiment, the first enable path may allow the CPU 110 and GPU 180 side remote call establishment, and the information is transferred from the CPU to the GPU 110 side 180 side. 在一个实施例中,GPU侧粘合逻辑530可允许GPU 180接收从CPU侧110传送的信息。 In one embodiment, GPU side glue logic 530 may allow the GPU 180 receives the information transmitted from the CPU 110 side.

[0036] 在一个实施例中,CPU存根510可包括与第一成员函数(即原始GPU成员函数)相同的名称,但可封入API 520以将调用从CPU 110定向到GRJ 180。 [0036] In one embodiment embodiment, CPU 510 may comprise a first stub member function (i.e., the original function member GPU) the same name, but may be sealed to the API 520 from the CPU 110 calls directed to GRJ 180. 在一个实施例中,由CPU编译器118生成的代码可原样地调用第一成员函数,但调用可被重新定向到CPU存根510和远程调用API 520。 In one embodiment, the compiler generated by the CPU 118 first calls the member function code as it is, but the call can be redirected to the CPU 510 and the remote stub calls API 520. 再者,当进行远程调用时,CPU存根510可发送表示正在调用的第一成员函数的唯一名称和指向共享对象的指针以及被调用的第一成员函数的其它参数。 Further, when the remote call, the CPU 510 may send stub represent other parameters unique name of the first member and is calling a function pointer to a shared object, and a first member function is called. 在一个实施例中,GPU侧粘合逻辑530可接收参数,并调度第一成员函数调用。 In one embodiment, the GPU side glue logic 530 may receive parameters, and scheduling a first member function calls. 在一个实施例中,GPU 编译器188可生成粘合逻辑(或调度器),其可通过用作为第一参数传递的对象指针调用第一成员函数的GPU侧函数地址来调度非虚函数。 In one embodiment, GPU 188 may generate the compiler glue logic (or dispatcher), it may be scheduled by the non-virtual function GPU object-side function address passed as a first parameter a pointer of the first member function calls. 在一个实施例中,GPU编译器I88可生成GPU 侧上的跳转表注册调用来注册GPU侧粘合逻辑530,以使CPU存根510能够与GRJ侧粘合逻辑530通信。 In one embodiment, GPU I88 compiler may generate a jump table on the side of the registration call to register GPU GPU side glue logic 530, to cause the CPU 530 to communicate with a stub 510 can GRJ side glue logic.

[0037] 在一个实施例中,GPU编译器188可创建第二启用路径,包括用于CPU函数的GPU存根550、GPU侧180上的GPU远程调用API 570以及用于分配给CPU 110的第二成员函数的CPU 侧粘合逻辑580。 [0037] In one embodiment, compiler 188 may create a GPU to enable a second path 550 includes a GPU stub functions for the CPU, GPU on the remote side of the GPU 180 and API 570 calls for the CPU 110 allocated to the second CPU side member functions glue logic 580. 在一个实施例中,GPU 180可使用第二启用路径对CPU侧110进行调用。 In one embodiment, GPU 180 to the CPU side 110 may be called a second enable path. 在一个实施例中,GPU存根550和API 570可允许GPU 180与CPU侧110建立远程调用,并将信息从GPU侧180传送到CPU侧110。 In one embodiment, API 570 and GPU 550 stub GPU 180 may allow the CPU 110 establishes a remote call side, and the information is transmitted from GPU to CPU 180 side 110 side. 在一个实施例中,CPU侧粘合逻辑580可允许CPU 110接收从GPU 侧180传送的信息。 In one embodiment embodiment, CPU 580 may allow the glue logic side CPU 110 receives the information transmitted from the GPU 180 side.

[0038] 在一个实施例中,为了支持第二成员函数调用,GTO编译器I88可生成用于⑶U侧粘合逻辑580的跳转表注册。 [0038] In one embodiment, in order to support the second member function call, the GTO I88 compiler may generate ⑶U glue logic side jump table 580 registered. 在一个实施例中,可在CPU粘合逻辑580中调用第二成员函数的CPU侧函数地址。 In one embodiment, the second member function can be called by the CPU 580, the CPU side glue logic function address. 在一个实施例中,由CPU粘合逻辑580生成的代码可与由CTO编译器118生成的其它代码链接。 In one embodiment, the glue logic 580 by the CPU generated code can be linked with other code generated by the compiler 118 CTO. 这种方法可提供支持异质处理器110与180之间双向通信的路径。 This approach may provide a path to support bidirectional communication between the processor 110 and the heterojunction 180. 在一个实施例中,CPU存根逻辑510和CPU侧粘合逻辑580可经由CPU链接器590耦合到CPU 110。 In one embodiment embodiment, CPU logic 510 and the CPU-side stubs glue logic 580 may be coupled to the CPU 110 via the CPU 590 linker. 在一个实施例中,CPU链接器590可使用CPU存根510、CPU侧粘合逻辑580和由CPU编译器118生成的其它代码生成CPU可执行595。 In one embodiment embodiment, CPU 590 may use the link stub CPU 510, CPU 580 and the other side of the glue logic code generated by the compiler 118 CPU 595 generates an executable CPU. 在一个实施例中,GPU存根逻辑550和GPU侧粘合逻辑530可经由GTO链接器540耦合到GPU 180。 In one embodiment, GPU 550 and GPU side stubs logic glue logic 530 may be coupled to the GPU 540 via the GTO 180 [linker. 在一个实施例中,GPU链接器540可使用GPU粘合逻辑530、GPU存根550和由GPU编译器188生成的其它代码生成GPU可执行5妨。 In one embodiment, GPU 540 may use the GPU link glue logic 530, GPU 550 and other code stubs by the GPU 188 generated by the compiler to generate an executable 5 GPU harm.

[0039]图6中示出了CPU侧110使用上面描述的基于表的技术调用GinJ虚函数和GPU非虚函数的流程图600的实施例。 [0039] FIG. 6 shows a flowchart of a virtual function call GinJ GPU and non-virtual function table based techniques side CPU 110 using the above-described embodiment 600. 示出了块610,包括共享类实例或标题为共享类F〇〇()的对象,其包含注释虚函数(例如VF 133-A)和虚函数调用“Virtual void SomeVirtFuncO”的第一注释标签#Pragma GPU以及注释非虚函数(例如NVF 136-A)和非虚函数调用“void SomeNonVirtuFunc ()”的第二注释标GPU。 Block 610 is shown, including a shared class instance or class share entitled F〇〇 () object which contains comments virtual function (e.g. VF 133-A) and a virtual function call "Virtual void SomeVirtFuncO" first comment tag # Note pragma GPU and non-virtual functions (e.g. NVF 136-a) and a non-virtual function call "void SomeNonVirtuFunc ()" standard annotation second GPU.

[0040] 在一个实施例中,'pFoo '可指向类Foo ()的共享对象131,并且可从CPU侧110到GPU侧18〇完成远程虚函数调用。 [0040] In one embodiment, 'pFoo' class may point to Foo () 131 shared objects, and virtual function call can be completed remotely from the CPU 110 to the GPU side 18〇 side. 在一个实施例中,,pFoo = new (SharedMemoryAllocator ())Foo ();'可以是用共享存储器分配/释放运行时调用覆盖新的/删除运算符的一种可能方式。 In one embodiment ,, pFoo = new (SharedMemoryAllocator ()) Foo (); 'may be a shared memory allocation / release of a new call coverage / delete operator may run time. 在一个实施例中,CHJ编译器118响应于在块610中编译'pFo〇->SomeVirtuFunc 0 ',可发起在块620中描绘的任务。 In one embodiment, CHJ compiler 118 in response to the coding block 610 'pFo〇-> SomeVirtuFunc 0', the task may initiate depicted in block 620.

[0041] 在块620,0?11侧110可调用6?1]虚函数。 [0041] At block 620,0? 11 6 side 110 may invoke? 1] virtual functions. 在块630,0?1]侧存根(用于01311成员函数) 510和API 520可向GPU侧180发送信息(参数)。 API 520 and 510 may transmit information (parameters) to the side of the GPU 180 at block 630,0? 1] side stub (member functions for 01,311). 在块640,GPU侧粘合逻辑(用于GPU成员函数)530可从THIS对象获得pGPUVptr (CPU侧虚函数表指针),并且可找到GPU虚函数表。 At block 640, GPU side glue logic (member functions for GPU) 530 may be obtained pGPUVptr (CPU side virtual function table pointer) from THIS objects, and can be found GPU virtual function table. 在块650,GPU侧粘合逻辑540 (或调度器)可具有上面描述的在块350中描绘的代码序列,以使用CHJ侧虚函数表指针获得GPU侧虚函数表。 At block 650, GPU side glue logic 540 (or scheduler) may have a code sequence described above in block 350 depicted, side CHJ using virtual function table pointer obtained GPU side virtual function table.

[0042] 在一个实施例中,GPU编译器188响应于在块610中编译#Pragma GPU ' voidSomeNonVirtuFunc () ',可生成使用'pFooSomeNonVirtuFunc () '发起在块670中描绘的任务的代码。 [0042] In one embodiment, GPU 188 in response to the compiler to compile the code block 610 #Pragma GPU 'voidSomeNonVirtuFunc ()', may be generated using 'pFooSomeNonVirtuFunc ()' initiated tasks depicted in block 670. 在块670,CPU侧110可调用GPU非虚函数。 At block 670, CPU 110 may invoke GPU side non-virtual functions. 在块680,CPU侧存根510和API 520 可向GPU侧180发送信息(参数)。 At block 680, CPU 510 side stubs and API 520 may transmit information (parameters) to the GPU 180 side. 在块690,GPU侧粘合逻辑530可推送参数并直接调用地址, 因为函数地址可能已经已知了。 At block 690, GPU side glue logic 530 may directly push the parameters and call address, the address may have been because the function is known.

[0043]在图7的流程图中示出了由计算平台100执行的使用虚拟共享非相干区域支持异质处理器之间的虚函数共享的操作的实施例。 [0043] In the flowchart of FIG. 7 shows an embodiment using executed by a computing platform 100 supports a virtual region share the incoherent virtual function between heterogeneous processors share operations. 在计算系统诸如包含异质处理器(诸如CPU 110和GPU 180)的计算系统100中,CPU 110和GPU 180可运行由不同编译器诸如118和188 (或具有不同目标的相同编译器)生成的不同代码,不可能保证相同虚函数位于相同地址。 The computing system comprises a heterogeneous computing system, such as a processor (such as a CPU 110 and GPU 180) in 100, CPU 110 and GPU 180 may be operated by different compilers, such as 118 and 188 (having the same or a different compiler targets) generated different codes, impossible to ensure the same function in the same virtual address. 虽然有可能将编译器/链接器/加载器修改成支持虚函数共享,但下面描述的“非相干区域”方法(仅有运行时的方法)可能是允许CPU 110与GPU 180之间虚函数共享的更简单技术。 While it is possible to compiler / linker / loader modified to support the sharing of virtual functions, but "non-coherent area" (method only run-time) described below may be a virtual function allows the CPU 110 and the GPU 180 is shared between simpler technology. 这种方法可允许容易地接受和部署共享虚拟存储器系统,诸如我的/你的/我们的(MY0)。 This approach allows for easy acceptance and deployment of virtual shared memory system, such as my / your / our (MY0). 尽管作为示例使用C++面向对象的语言,但下面的方法可应用于支持虚函数的其它面向对象的编程语言。 Although as an example of C ++ object-oriented language, but the following method can be applied to other object-oriented programming languages ​​support virtual functions.

[0044] 在块71〇,CPU 110可在共享虚拟存储器130内创建共享非相干区域以存储CPU 110 和GPU 180的共享类的虚函数表。 [0044] At block 71〇, CPU 110 may create a shared region incoherent virtual function table storage class shared CPU 110 and GPU 180 130 in the shared virtual memory. 在一个实施例中,可通过规定到共享虚拟存储器130内区域的非相千标签来创建共享非相干区域。 In one embodiment, predetermined by the one thousand labels with a non-shared region 130 to create a shared virtual memory regions incoherent. 在一个实施例中,MY0运行时可提供一个或多个应用可编程接口(API)函数来创建虚拟共享区域(在MY0的术语中称为“场所(arena)”,并且在MY0中可创建许多此类场所)。 In one embodiment, can be provided MY0 running one or more application programming interface (API) function to create a shared virtual area (called "spaces (Arena)" in the term MY0, and may be created in many MY0 such places). 例如,可使用诸如myoArenaCreate (XXX,..., NonCoherentTag)或myoArenaCreateNon CoherentTag (xxx, • • •)的标签。 For example, you can use such as myoArenaCreate (XXX, ..., NonCoherentTag) or myoArenaCreateNon CoherentTag (xxx, • • •) label. 在一个实施例中,使用以上标签可创建相千或非相干场所。 In one embodiment, the label may be used to create more than one thousand or non-coherent phase properties. 然而,在其它实施例中,可使用API函数来改变存储器大块(或部分)的属性。 However, in other embodiments, it may use the API function to change the bulk memory (or portion) of the properties. 例如,可使用myoChangeToNonCoherent (addr size)创建第一区域作为非相干区域或场所,以及第二区域(或部分)作为相干场所。 For example, a myoChangeToNonCoherent (addr size) creating a first region or area as a non-coherent properties and a second region (or portion) as coherent properties. 在一个实施例中, 第一区域可由地址大小规定。 In one embodiment, the first region may be a predetermined address size.

[0045]在一个实施例中,可创建存储器场所(g卩,管理的存储器大块),其可允许数据共享而无需保持数据一致性,并且这种存储器场所可称为共享非相干区域。 [0045] In one embodiment, the memory properties can be created (g Jie, chunk memory management), which may allow data sharing without the need to maintain data consistency, and such properties may be referred to as a shared memory region incoherent. 在一个实施例中, 存储在共享非相干区域内的CPU数据和GPU数据可具有与由CPU 110和GPU 180看到的相同的地址。 In one embodiment embodiment, CPU and GPU data stored in the data non-coherent shared region may have the same address by the CPU 110 and GPU 180 [see. 然而,内容(CPU数据和GPU数据)可不同,因为共享虚拟存储器130诸如MY0在运行时可以不保持一致性。 However, the content (CPU and GPU data data) may be different, such as the shared virtual memory 130 at runtime or may not MY0 consistency. 在一个实施例中,共享非相千区域可用于存储每个共享类的虚拟方法表的新拷贝。 In one embodiment, the non-shared areas can be used with new one thousand copies of each virtual method table stored in the shared class. 在一个实施例中,从CPU 110和GPU 180所看到的虚函数表地址可相同;然而,虚函数表可不同。 In one embodiment, CPU 110 may be from the same GPU 180 and virtual function table saw address; however, the virtual function table may be different.

[0046] 在块750,在初始化时间期间,每个可共享类的虚函数表可从CPU私有空间115和GPU私有空间185拷贝到共享虚拟存储器130。 [0046] At block 750, during initialization time, each virtual function table shareable class can be copied from the CPU 115 and GPU private space private shared virtual memory space 185 to 130. 在一个实施例中,CPU侧虚函数表可被拷贝到共享虚拟存储器130内的非相干区域中,并且GPU侧虚函数表也可被拷贝到共享虚拟存储器130内的非相干区域中。 In one embodiment embodiment, CPU side virtual function table may be copied to a shared virtual area in the non-coherent memory 130 and GPU side virtual function table may be copied to the non-interference area within the shared virtual memory 130. 在一个实施例中,在共享空间中,CPU侧虚函数表和GPU侧虚函数表可位于相同地址。 In one embodiment, in the shared space, the CPU and the GPU side vtable the virtual function table side may be located at the same address.

[0047]在一个实施例中,如果工具链支持是可用的,则CPU编译器118或GPU编译器188可在特殊数据段中包含CPU和GPU虚函数表数据,并且加载器540或570可将特殊数据段加载到共享非相干区域。 [0047], if the tool chain support is available, the CPU compiler 118 or GPU compiler 188 may include a CPU and a GPU virtual function table data in a special data segment In one embodiment, and loaded 540 or 570 may be special data segment loaded into the shared non-coherent areas. 在其它实施例中,CPU编译器118或GRJ编译器188可允许例如使用API 调用诸如myoChangeToNonCoherent在共享非相干区域中创建特殊数据段。 In further embodiments embodiment, CPU 118 or GRJ compiler compiler 188 may allow, for example, create a special data segment as myoChangeToNonCoherent incoherent shared area using an API call. 在一个实施例中,CTO编译器II8和GHJ编译器1洲可确保CPU虚函数表和GPU虚函数表可在特殊数据段内位于相同偏移地址(要不就具有恰当填充)。 In one embodiment, CTO and II8 GHJ compiler compiler ensures that CPU 1 Continent virtual function tables and virtual function tables GPU may be located at the same offset address within the special data segment (or to have the correct filling). 在一个实施例中,在多重继承的情况下,在对象布局中可能存在多个虚函数表指针。 In one embodiment, in the case of multiple inheritance, there may be multiple virtual function table pointer in the object layout. 在一个实施例中,CPU编译器118和GPU编译器188也可确保CPU虚函数表和GHJ虚函数表指针可在对象布局中位于相同偏移。 In one embodiment embodiment, CPU 118 and GPU compiler compiler CPU 188 can be ensured and GHJ vtable the virtual function table pointer in the object may be located at the same offset layout.

[0048] 在缺乏工具链支持的情况下,在一个实施例中,可允许用户将CPU虚函数表和GPU 虚函数表拷贝到共享非相干区域。 [0048] In the absence of chain support tool, in one embodiment, it may allow the user to copy CPU and GPU vtable the virtual function table to share non-interference area. 在一个实施例中,可生成一个或多个宏以便于CPU和GPU 表到共享非相千存储区域的这种人工拷贝。 In one embodiment, may generate one or more macro tables so that the CPU and GPU to share non-storage area with such artificial one thousand copies.

[0049] 在运行时,在可创建共享对象诸如共享对象131之后,可创建对象布局801,其可包含用于多重继承的多个〃vptr〃。 [0049] In operation, the shared objects can be created, such as after the shared object 131, object layout 801 may be created, which may comprise a plurality of 〃vptr〃 multiple inheritance. 在一个实施例中,对象表801中的共享对象131的虚拟表指针(vptr)可被更新(打补丁)成指向共享非相干区域中的虚函数表的新拷贝。 In one embodiment, the object table 801 in the shared object 131 in the virtual table pointer (the vptr) may be updated (patched) to point to the new copy of the shared region incoherent virtual function table. 在一个实施例中,可使用类的构造器更新共享对象的虚拟表指针,该类可包含虚函数。 In one embodiment, the class constructor may be used to update the virtual table pointer to a shared object, the class may include virtual functions. 在一个实施例中, 如果类不包含任何虚函数,则这种类的数据和函数可被共享,并且可能不一定在运行时期间更新(或打补丁)。 In one embodiment, if a class does not contain any virtual function, and the function of this kind of data can be shared, and may not necessarily be updated during runtime (or patch).

[0050]在块780, vptr (虚函数表指针)可被修改成指向共享非相干区域,同时创建共享对象131。 [0050] At block 780, vptr (virtual function table pointer) may be modified to point to a non-coherent shared region, creating a shared object 131. 在一个实施例中,vptr通过默认指向私有虚函数表(CPU虚函数表或GPU虚函数表) 可被修改(如在图8中由实线802-C所指示)成指向共享非相干区域860。 In one embodiment, vptr by default to the private virtual function table (CPU virtual function table or GPU virtual function table) may be modified noncoherent region (as indicated in Figure 8 by solid lines 802-C) to point to share 860 . 在一个实施例中, 虚函数可按如下调用: In one embodiment, the virtual function call may be as follows:

[0051] Mov eax, [ecx]#ecx含有“this”指针,eax含有vptr; [0051] Mov eax, [ecx] #ecx containing "this" pointer, eax containing the vptr;

[0052] Call [eax,vfunc]#vfunc是虚函数表中的虚函数索引。 [0052] Call [eax, vfunc] #vfunc virtual function is a virtual function table index.

[0053]在CPU侧,以上代码可调用虚函数的CPU实现;而在GPU侧,以上代码可调用虚函数的GPU实现。 [0053] In the CPU side, the above code can be implemented virtual function is called a CPU; in side GPU, GPU or more virtual function call code implementation. 这种方法可允许类的数据共享和虚函数共享。 This approach may allow the sharing of data and shared class virtual functions.

[0054]在图8中示出了使用虚拟共享非相干区域支持异质处理器之间的虚函数共享的关系图800的实施例。 [0054] In FIG. 8 illustrates the use of a shared virtual area supports Example incoherent virtual function between heterogeneous processors share a relationship 800 of FIG. 在一个实施例中,对象布局8〇1可包含第一槽801-A中的虚拟表指针(vptr)以及其它字段诸如槽801-B和801-C中的字段1和字段2。 In one embodiment, the virtual object layout may comprise 8〇1 table pointer (the vptr) a first groove 801-A as well as other fields, such as groove 801-B and 801-C of field 1 and field 2. 在一个实施例中,在CPU编译器118和GHJ编译器I88执行位于槽S01-A中的虚函数表指针(vptr)之后,生成(如虚线802-A所指示的)CPU虚函数表和GPU虚函数表(如虚线802-B所指示的)<XPU虚函数表(CPU 虚函数表)可位于CPU私有地址空间115内的地址810,并且GPU虚函数表可位于GPU私有地址空间1邪内的地址840。 After one embodiment, I88 CPU executing compiler 118 and GHJ compiler located in the virtual function table pointer slots S01-A in (vptr), generating (e.g., dotted line 802-A indicated) CPU virtual function table and GPU virtual function table (e.g., a broken line 802-B indicated) <XPU virtual function table (CPU virtual function table) can be located at the address 810 in the CPU private address space 115, and GPU virtual function table may be located within an evil private address space GPU address 840. 在一个实施例中,CPU虚函数表可包含诸如vf unc 1和vf unc2的函数指针,并且GPU虚函数表可包含诸如vfuncl '和vfunc2'的函数指针。 In one embodiment embodiment, CPU and a virtual function table may contain a function pointer, such vf unc 1 vf unc2 and GPU virtual function table may contain a function pointer as vfuncl 'and vfunc2' of. 在一个实施例中,函数指针(vfuncl和vfunc2)与(vfuncl'和vfunc2')也可不同,因为指针指向同一函数的不同实现。 In one embodiment, the function pointer (vfuncl and vfunc2) and (vfuncl 'and vfunc2') may be different because of different pointers to achieve the same function.

[0055]在一个实施例中,作为修改vptr (如在块780中所示)的结果,vptr可指向共享虚拟存储器13〇内的共享非相干区域860。 [0055] In one embodiment, the vptr as modified (as shown in block 780) results in, can point to the vptr shared non-coherent shared region 860 within the virtual memory 13〇. 在一个实施例中,CHJ虚函数表可位于地址Address 870,并且GPU虚函数表可位于相同地址Address 870。 In one embodiment, CHJ virtual function table address can be located Address 870, and GPU may be located in the same virtual function table address Address 870. 在一个实施例中,CPU虚函数表可包含诸如vf unc 1和vf unc2的函数指针,并且GPU虚函数表可包含诸如vf unc 1 '和vf unc2 '的函数指针。 In one embodiment embodiment, CPU and a virtual function table may contain a function pointer, such vf unc 1 vf unc2 and GPU virtual function table may contain 1 'and vf unc2' function pointer as vf unc. 在一个实施例中,函数指针(vfuncl和vfunc2)与(vfuncl '和vfunc2')可不同。 In one embodiment, the function pointer (vfuncl and vfunc2) and (vfuncl 'and vfunc2') may be different. 在一个实施例中,将CPU虚函数表和GTO虚函数表保存在共享非相干区域860中可使CPU 110和GPU 180分别能够在同一地址位置AddreSS870看到CPU虚函数表和GPU虚函数表,然而,CPU 虚函数表的内容(vfuncl和vfunc2)可不同于GPU虚函数表的内容(vfuncl '和vfunc2')。 In one embodiment, the virtual function table CPU and virtual function table holding GTO can CPU 110 and GPU 180 [see, respectively, CPU and GPU vtable the virtual function table at the same address location in the shared AddreSS870 incoherent region 860, However, the contents of CPU virtual function table (vfuncl and vfunc2) may be different from the contents of the virtual function table of the GPU (vfuncl 'and vfunc2'). [0056]在图9中示出了包括支持双向通信的异质处理器的计算机系统900的实施例。 [0056] FIG. 9 shows an embodiment comprising a heterogeneous support bidirectional communication processor computer system 900. 参考图9,计算机系统900可包含包括单指令多数据(S頂D)处理器的通用处理器(或CPU) 902和图形处理器单元(GPU) 905。 Referring to Figure 9, computer system 900 may comprise a general purpose processor (or CPU) 902 and a graphics processor unit (GPU) 905 includes a single instruction multiple data (S top D) processor. 在一个实施例中,CRJ 902除了执行各种其它任务或存储指令序列之外还可执行增强操作,以在机器可读存储介质925中提供增强操作。 In one embodiment, CRJ 902 in addition to performing various other tasks or store a sequence of instructions further perform enhancement operations than to a machine-readable storage medium 925 to provide enhanced operation. 然而,指令序列还可存储在CIHJ私有存储器920中,或者任何其它适当的存储介质中。 However, the sequence of instructions may also be stored in private memory 920 CIHJ, or any other suitable storage medium. 在一个实施例中,CPU 902可与CPU旧编译器903和CPU链接器/加载器904相关联。 In one embodiment, CPU 902 may be an old compiler CPU 903 and CPU linker / loader 904 is associated. 在一个实施例中,GinJ 905可与GPU专有编译器906和GPU链接器/加载器907相关联。 In one embodiment, GinJ 905 with GPU 906 and GPU specific compiler linker / loader 907 is associated.

[0057]虽然在图9中描绘了单独图形处理器单元GPU 905,但在一些实施例中,作为另一个示例,处理器902可用于执行增强操作。 [0057] Although depicted as a separate graphics processor unit GPU 905 in FIG. 9, in some embodiments, as another example, processor 902 may be used to perform enhancement operations. 操作计算机系统900的处理器902可以是耦合到逻辑930的一个或多个处理器核。 Operating a computer system 900 that processor 902 may be coupled to one or more logical processor core 930. 逻辑930可耦合到一个或多个I/O装置960,其可提供到计算机系统900的接口。 Logic 930 may be coupled to one or more I / O devices 960, which may provide an interface to computer system 900. 逻辑930在一个实施例中例如可以是芯片集逻辑。 In one embodiment, logic 930 may be, for example, the chipset logic. 逻辑930耦合到存储器920,其可以是任何种类的储存器,包含光、磁或半导体储存装置。 Logic 930 is coupled to memory 920, which may be any kind of reservoir, comprising an optical, magnetic or semiconductor storage device. 图形处理器单元905 通过帧缓冲器耦合到显示器940。 The graphics processor 905 is coupled to the display unit 940 through the frame buffer.

[0058] 在一个实施例中,计算平台900可支持一种或多种技术以允许通过细粒度划分共享对象来通过诸如共享对象的虚函数的成员函数进行异质处理器CPU 9〇2与GPU 905之间的双向通信(函数调用)。 [0058] In one embodiment, computing platform 900 may support one or more techniques to allow for fine-grained objects shared by dividing member functions such as a shared object virtual function hetero processor CPU and GPU 9〇2 two-way communication between the 905 (function call). 在一个实施例中,计算机系统900可允许使用称为“基于表”的技术的第一技术进行CPU 902与GPU 905之间的双向通信。 In one embodiment, computer system 900 may be referred to as first technique allows the use of technology, "table-based" in two-way communication between the 902 and the GPU 905 CPU. 在其它实施例中,计算平台可允许使用称为“非相干区域”技术的第二技术进行CPU 902与GPU 905之间的双向通信,在该技不干,卩」仕仅t私令⑶U存储器920、私有gpu存储器930或共享存储器95〇中的虚拟共享存储,中创建虚拟共享非相干区域。 In other embodiments, the computing platform may allow the use referred to as "non-coherent area" technology second bidirectional communication technology between CPU 902 and GPU 905, quit in this technology, Jie "t private official order only memory ⑶U 920, private virtual shared memory gpu memory 930 or shared memory 95〇 in, create a shared virtual incoherent area. 在—个实施例中,在计算机系统9〇〇中可不提供单独的共享存储器诸如共享存储器950,并且在这种情况下,可在其中一个私有存储器诸如cpu存储器920或GPU存储器930内提供共享存储器。 In - one embodiment, the computer system may not provide a separate 9〇〇 in a shared memory such as a shared memory 950, and in this case, which can provide a private memory such as a memory within the shared memory 920 or cpu GPU memory 930 .

[00J9]在一个实施例中,当使用基于表的技术时,可用于访问来自CPu 110或GPu 180的共享对象的共享对象的CPU侧虚函数表指针可用于确定GPU虚函数表,如果存在GPU侧表的话。 [00J9] In one embodiment, when the technique based on the table used for accessing from a shared objects CPu 110 or GPu shared objects 180 on the CPU side virtual function table pointer can be used to determine GPU virtual function table, if there GPU table side words. 在一个实施例中,GPU侧虚函数表可包含〈〃类名〃、CPU虚函数表地址、GPU虚函数表地址>。 In one embodiment, GPU side virtual function table may contain <〃 〃 class name, CPU virtual function table address, GPU virtual function table address>. 在一个实施例中,获得GPU侧虚函数表地址和生成GPU侧表的技术,如上所述。 In one embodiment, GPU technical side and the virtual function table address generation GPU side table, as described above.

[0060] 在其它实施例中,当使用“非相干区域”技术时,在共享虚拟存储器内创建共享非相干区域。 [0060] In other embodiments, when a "non-interference area" technology, to create a shared virtual area within the shared non-coherent memory. 在一个实施例中,共享非相干区域可以不保持数据一致性。 In one embodiment, the shared region may not be incoherent data consistency. 在一个实施例中,共享非相干区域内的CPU侧数据和GPU侧数据可具有与从CPU侧和GPU侧看到的相同的地址。 In one embodiment, the shared data-side CPU and GPU side data in the incoherent regions may have the same address seen from the CPU side and the GPU side. 然而,CPU侧数据的内容可不同于GPU侧数据的内容,因为共享虚拟存储器在运行时期间可以不保持一致性。 However, the content data may be different from the CPU side GPU side data content, because the shared virtual memory during runtime may not be consistent. 在一个实施例中,共享非相干区域可用于存储每个共享类的虚拟方法表的新拷贝。 In one embodiment, the non-interference area can be used to share a new copy of the virtual method table for each shared storage class. 在一个实施例中,这种方法可将虚拟表保持在同一地址。 In one embodiment, this method may be maintained at the same virtual address table.

[0061] 本文描述的图形处理技术可用各种硬件架构实现。 [0061] The graphics processing techniques described herein may be implemented in various hardware architectures. 例如,图形功能性可集成在芯片集内。 For example, graphics functionality may be integrated within a chipset. 备选地,可使用分立图形处理器。 Alternatively, a discrete graphics processor may be used. 作为又一实施例,图形功能可由通用处理器实现,包含多核处理器,或实现为存储在机器可读介质中的软件指令集。 As still another embodiment, the graphics functions implemented by a general purpose processor, a processor, including a multicore, or as stored on a machine-readable medium, the software instruction set.

Claims (11)

  1. 1. 一种计算平台,包括: 中央处理单元(CPU)和图形处理单元(GPU)的组合;以及、 所述GPU和CPU都可访问的共享虚拟存储器,其中所述计算平台包括共享物理存储器以支持所述CPU和GPU都可访问的共享虚拟存储器, 其中,所述计算平台适于: 将包括多个虚函数的共享对象存储在所述共享虚拟存储器中;以及在所述CPU和GRJ之间共享所述多个虚函数中的至少一个。 A computing platform, comprising: a combination of a central processing unit (CPU) and a graphics processing unit (GPU); a shared virtual memory, and the GPU and CPU can access, wherein the computing platform includes a shared physical memory to supporting the CPU and GPU can access a shared virtual memory, wherein the computing platform is adapted to: storing a plurality of shared objects including virtual functions in the shared virtual memory; and between the CPU and the GRJ the plurality of sharing at least one virtual function. > >、 >>,
  2. 2. 根据权利要求1所述的计算平台,其中,所述计算平台在所述CPU和GPU之间共享所述多个虚函数包括在所述CPU和所述GPU之间的双向通信。 The computing platform according to claim 1, wherein the computing platform between the CPU and GPU sharing function comprises a plurality of virtual bidirectional communication between the CPU and the GPU.
  3. 3. 根据权利要求1所述的计算平台,其中,所述共享对象还包括非虚函数。 3. A computing platform according to claim 1, wherein the shared object further comprises a non-virtual function.
  4. 4. 根据权利要求1所述的计算平台,其中,所述共享对象包括用来索引虚函数表的虚拟表指针。 The computing platform according to claim 1, wherein said shared objects comprises a virtual table pointer index to virtual function table.
  5. 5. 根据权利要求1所述的计算平台,其中,所述共享对象包括CPU侧虚拟表指针。 The computing platform according to claim 1, wherein said object comprises a CPU-side shared virtual table pointer.
  6. 6. 根据权利要求5所述的计算平台,其中,所述GPU利用所述CPU侧虚拟表指针来确定GPU侧虚拟表。 The computing platform of claim 5, wherein the GPU with the CPU-side virtual pointer table to determine the virtual table GPU side.
  7. 7. 根据权利要求6所述的计算平台,其中,所述GPU侧虚拟表包含类名、CPU侧虚拟表地址和GI^U侧虚拟表地址。 The computing platform according to claim 6, wherein the GPU side virtual table contains the class name, the CPU side and the virtual address table GI ^ U side virtual address table.
  8. 8. 根据权利要求6所述的计算平台,其中,所述GPU侧虚拟表支持动态链接库或静态链接库中的至少一个。 8. The computing platform of claim 6, wherein the GPU virtual table side supports dynamic link libraries statically linked library, or at least one.
  9. 9. 根据权利要求1所述的计算平台,其中,所述计算平台还在所述共享虚拟存储器内创建共享非相干区域,并且将CPU侧虚拟表和GI^U侧虚拟表拷贝到所述共享虚拟存储器中,其中,所述(PU侧虚拟表和所述侧虚拟表在所述共享虚拟存储器中具有相同地址。 9. The computing platform according to claim 1, wherein the computing platform is also created in the shared virtual area shared non-coherent memory, and the CPU side and the virtual table side GI ^ U copied to the shared virtual table virtual memory, wherein the (PU-side virtual table and said side virtual table with the same address in the shared virtual memory.
  10. 10. 根据权利要求9所述的计算平台,其中,所述计算平台还修改虚拟表指针以指向所述相同地址,其中所述CPU侧虚拟表包括CPU侧函数指针,并且其中所述GPU侧虚拟表包括不同于所述CPU侧函数指针的GPU侧函数指针。 10. The computing platform according to claim 9, wherein the computing platform is further modified virtual table pointer to point to the same address, wherein the CPU includes a CPU side side virtual table function pointers, and wherein said side virtual GPU function pointer table includes a GPU side different from the side of the CPU of the function pointers.
  11. 11. 根据权利要求1〇所述的计算平台,其中,所述计算平台还包括cpu私有存储器空间, 其中所述计算平台拷贝来自所述CpU私有存储器空间的CPU侧虚拟表。 11. The computing platform of claim 1〇 claim, wherein the computing platform further comprises a cpu private memory space, wherein the computing platform is copied from the table CpU CPU side virtual private memory space.
CN 201410790536 2010-09-24 2010-09-24 Shared virtual memory virtual function shared between the heterogeneous processors in a computing platform CN104536740B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN 201080069225 CN103109286B (en) 2010-09-24 2010-09-24 Shared virtual memory virtual function shared between the heterogeneous processors in a computing platform
CN 201410790536 CN104536740B (en) 2010-09-24 2010-09-24 Shared virtual memory virtual function shared between the heterogeneous processors in a computing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201410790536 CN104536740B (en) 2010-09-24 2010-09-24 Shared virtual memory virtual function shared between the heterogeneous processors in a computing platform

Publications (2)

Publication Number Publication Date
CN104536740A true CN104536740A (en) 2015-04-22
CN104536740B true CN104536740B (en) 2018-05-08

Family

ID=52852272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201410790536 CN104536740B (en) 2010-09-24 2010-09-24 Shared virtual memory virtual function shared between the heterogeneous processors in a computing platform

Country Status (1)

Country Link
CN (1) CN104536740B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1524228A (en) * 2001-04-24 2004-08-25 先进微装置公司 Multiprocessor system implementing virtual memory using a shared memory, and a page replacement method for maintaining paged memory coherence

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7093080B2 (en) * 2003-10-09 2006-08-15 International Business Machines Corporation Method and apparatus for coherent memory structure of heterogeneous processor systems
US20070283336A1 (en) * 2006-06-01 2007-12-06 Michael Karl Gschwind System and method for just-in-time compilation in a heterogeneous processing environment
US8156307B2 (en) * 2007-08-20 2012-04-10 Convey Computer Multi-processor system having at least one processor that comprises a dynamically reconfigurable instruction set
US8397241B2 (en) * 2008-11-13 2013-03-12 Intel Corporation Language level support for shared virtual memory
US8307350B2 (en) * 2009-01-14 2012-11-06 Microsoft Corporation Multi level virtual function tables

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1524228A (en) * 2001-04-24 2004-08-25 先进微装置公司 Multiprocessor system implementing virtual memory using a shared memory, and a page replacement method for maintaining paged memory coherence

Also Published As

Publication number Publication date Type
CN104536740A (en) 2015-04-22 application

Similar Documents

Publication Publication Date Title
US6901586B1 (en) Safe language static variables initialization in a multitasking system
Levis TinyOS programming
US20060265705A1 (en) Computer architecture and method of operation for multi-computer distributed processing with finalization of objects
US20080256330A1 (en) Programming environment for heterogeneous processor resource integration
Koshy et al. VMSTAR: synthesizing scalable runtime environments for sensor networks
US20060218536A1 (en) Virtual machine extended capabilities using application contexts in a resource-constrained device
US7162711B2 (en) Method of automatically virtualizing core native libraries of a virtual machine
US6704926B1 (en) Bimodal Java just-in-time complier
Welsh et al. Jaguar: Enabling efficient communication and I/O in Java
US5615400A (en) System for object oriented dynamic linking based upon a catalog of registered function set or class identifiers
US20040015911A1 (en) Translating and executing object-oriented computer programs
US20050102649A1 (en) Strategy for referencing code resources
US20130141443A1 (en) Software libraries for heterogeneous parallel processing platforms
US7219329B2 (en) Systems and methods providing lightweight runtime code generation
WO2005103924A1 (en) Modified computer architecture with initialization of objects
Chen et al. Enabling FPGAs in the cloud
US20110314458A1 (en) Binding data parallel device source code
US20040015914A1 (en) Loading object-oriented computer programs
CN101763279A (en) BootLoader architectural design method
US20090251475A1 (en) Framework to integrate and abstract processing of multiple hardware domains, data types and format
US20070006184A1 (en) Method and apparatus for combined execution of native code and target code during program code conversion
JP5090169B2 (en) Platform independent dynamic linking
Harvey et al. Swan: A tool for porting CUDA programs to OpenCL
US20100011339A1 (en) Single instruction multiple data (simd) code generation for parallel loops using versioning and scheduling
US20080215856A1 (en) Methods for generating code for an architecture encoding an extended register specification

Legal Events

Date Code Title Description
C10 Entry into substantive examination
GR01