CN104866295B - The design method and device of OpenCL runtime system frameworks - Google Patents

The design method and device of OpenCL runtime system frameworks Download PDF

Info

Publication number
CN104866295B
CN104866295B CN201410065503.7A CN201410065503A CN104866295B CN 104866295 B CN104866295 B CN 104866295B CN 201410065503 A CN201410065503 A CN 201410065503A CN 104866295 B CN104866295 B CN 104866295B
Authority
CN
China
Prior art keywords
podium level
level
podium
platform
framework
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410065503.7A
Other languages
Chinese (zh)
Other versions
CN104866295A (en
Inventor
刘颖
崔慧敏
冯晓兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Institute of Computing Technology of CAS
Original Assignee
Huawei Technologies Co Ltd
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Institute of Computing Technology of CAS filed Critical Huawei Technologies Co Ltd
Priority to CN201410065503.7A priority Critical patent/CN104866295B/en
Publication of CN104866295A publication Critical patent/CN104866295A/en
Application granted granted Critical
Publication of CN104866295B publication Critical patent/CN104866295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the invention discloses a kind of design method and device of OpenCL runtime systems framework, it is related to areas of information technology, the platform development complexity of heterogeneous system can be reduced.Methods described includes:OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level first, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized, then podium level IR is provided to functional layer and optimization layer, and realizes that providing podium level realizes framework at least one platform.The embodiment of the present invention heterogeneous system suitable for carrying out cross-platform transplanting.

Description

The design method and device of OpenCL runtime system frameworks
Technical field
The present invention relates to areas of information technology, the more particularly to a kind of design method and dress of OpenCL runtime systems framework Put.
Background technology
It is enterprising in each different isomerization platform according to general programmed method as isomerization hardware system is increasingly becoming main flow Row programming gradually becomes more and more important.Wherein, isomerization hardware system is mainly CPU(Central processing unit, Central processing unit)+GPU(Graphic Processing Unit, image processor)Isomerization hardware system.Specifically, different In construction system, pass through OpenCL(Open Computing Language, open computing language)Multiple programming framework, writing can With the program performed on corresponding platform.
At present, in OpenCL runtime systems, by first by OpenCL kernel(Operating system nucleus)Compiler Produce IR(Intermediate Representation, intermediate representation), then operationally, IR is produced on different product Executable code, so as to realize to cross-platform support.For example, CAL caused by AMD OpenCLkernel compilers (Compute Abstraction Layer, calculate level of abstraction)IR can produce the executable code on AMD different products; LLVM caused by Intel OpenCL kernel compilers(Low Level Virtual Machine, low level virtual machine) IR can produce the executable code on Intel different products;PTX caused by NVIDIA OpenCL kernel compilers (Parallel Thread Execution, parallel thread perform)The executable code on NVIDIA different products can be produced.
However, when producing the executable code on different product by IR at present, due to the OpenCL systems of different company Framework is different, therefore according to IR caused by the OpenCL kernel compilers of a certain company, can only support the isomery of the said firm Platform, so as to cause the introducing of implementation and new platform of the same optimization in different platform to be both needed to develop again, and then cause different The platform development complexity of construction system is higher.
The content of the invention
The embodiment of the present invention provides a kind of design method and device of OpenCL runtime systems framework, can reduce isomery The platform development complexity of system.
The technical scheme that the embodiment of the present invention uses for:
In a first aspect, the embodiment of the present invention provides a kind of design method of OpenCL runtime systems framework, including:
OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level, the podium level includes platform Layer intermediate representation IR, podium level realize that framework and at least one platform are realized;
The podium level IR is provided to the functional layer and the optimization layer, and realizes and provides at least one platform The podium level realizes framework.
With reference in a first aspect, in the first possible implementation of first aspect, the podium level IR includes framework Manager manager and accelerator manager.
With reference to the possible implementation of the first of first aspect or first aspect, second in first aspect may Implementation in, the podium level IR including podium level IR method and the description of the podium level IR, the platform Layer IR method includes forcing podium level IR or suggests podium level IR.
Second with reference to first aspect either the first possible implementation or first aspect of first aspect can Can implementation, it is described to the functional layer and the optimization layer in the third possible implementation of first aspect The step of providing the podium level IR includes:
There is provided to the functional layer and force podium level IR;
There is provided to the optimization layer and suggest podium level IR;
After the step of offer suggestion podium level IR to the optimization layer, in addition to:
For priority corresponding to the suggestion podium level IR configurations.
Second with reference to first aspect either the first possible implementation or first aspect of first aspect can The implementation of energy, or the third possible implementation of first aspect, in the 4th kind of possible realization of first aspect In mode, it is described realized at least one platform the step of podium level realizes framework is provided after, in addition to:
Generate the podium level IR.
With reference to the 4th kind of possible implementation of first aspect, in the 5th kind of possible implementation of first aspect In, include the step of the generation podium level IR:
Generate the description of the podium level IR;
Judge whether the resource shared by the method for the podium level IR is less than or equal to available resources;
If the resource shared by the method for the podium level IR is less than or equal to available resources, the podium level is generated IR method;
If the resource shared by the method for the podium level IR is more than available resources, according to corresponding to the podium level IR Priority, the resource shared by the minimum podium level IR of Release priority level.
Second aspect, the embodiment of the present invention provide a kind of design device of OpenCL runtime systems framework, including:
Division unit, it is described flat for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level Platform layer includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized;
Unit is provided, the podium level is provided for the functional layer divided to the division unit and the optimization layer IR, and realize that providing the podium level realizes framework at least one platform.
With reference to second aspect, in the first possible implementation of second aspect,
The podium level IR for providing unit offer includes framework manager manager and accelerator manager.
With reference to the possible implementation of the first of second aspect or second aspect, second in second aspect may Implementation in,
The method and the podium level IR of the podium level IR that unit offer is provided including the podium level IR Description, the method for the podium level IR include forcing podium level IR or suggest podium level IR.
Second with reference to second aspect either the first possible implementation or second aspect of second aspect can The implementation of energy, in the third possible implementation of second aspect,
The offer unit, podium level IR is forced specifically for being provided to the functional layer;
The offer unit, specifically it is additionally operable to provide to the optimization layer and suggests podium level IR;
Described device also includes:Dispensing unit;
The dispensing unit, for suggesting priority corresponding to podium level IR configurations to be described.
Second with reference to second aspect either the first possible implementation or second aspect of second aspect can The implementation of energy, or the third possible implementation of second aspect, in the 4th kind of possible realization of second aspect In mode, described device also includes:Generation unit;
The generation unit, for generating the podium level IR.
With reference to the 4th kind of possible implementation of second aspect, in the 5th kind of possible implementation of second aspect In, the generation unit includes:Generation module, judge module, release module;
The generation module, for generating the description of the podium level IR;
The judge module, can for judging whether resource shared by the method for the podium level IR is less than or equal to Use resource;
The generation module, it is additionally operable to when the judge module judges that the resource shared by the method for the podium level IR is small When available resources, the method that generates the podium level IR;
The release module, for judging that the resource shared by the method for the podium level IR is more than when the judge module During available resources, according to priority corresponding to the podium level IR, the resource shared by the minimum podium level IR of Release priority level.
The design method and device of OpenCL runtime systems framework provided in an embodiment of the present invention, first transport OpenCL System framework is divided into functional layer, optimization layer and podium level during row, and podium level includes podium level intermediate representation IR, podium level is realized Framework and at least one platform are realized, then provide podium level IR to functional layer and optimization layer, and realize at least one platform Podium level is provided and realizes framework.Compared with the executable code of the said firm's product is produced by the IR of different company at present, this hair Bright embodiment can be realized same excellent by the way that OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level Change is directly implemented in different platform, and only needs to develop podium level when introducing new platform, different so as to reduce The platform development complexity of construction system.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art In the required accompanying drawing used be briefly described, it should be apparent that, drawings in the following description be only the present invention some Embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also be attached according to these Figure obtains other accompanying drawings.
Fig. 1 is a kind of design method flow chart for OpenCL runtime systems framework that the embodiment of the present invention one provides;
Fig. 2 is a kind of OpenCL runtime systems framework that the embodiment of the present invention one provides;
Fig. 3 is a kind of structural representation of the design device for OpenCL runtime systems framework that the embodiment of the present invention one provides Figure;
Fig. 4 is a kind of structural representation for OpenCL runtime systems that the embodiment of the present invention one provides;
Fig. 5 is a kind of design method flow chart for OpenCL runtime systems framework that the embodiment of the present invention two provides;
Fig. 6 is a kind of structural representation of the design device for OpenCL runtime systems framework that the embodiment of the present invention two provides Figure;
Fig. 7 is a kind of structural representation for OpenCL runtime systems that the embodiment of the present invention two provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained all other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.
The advantages of to make technical solution of the present invention, is clearer, and the present invention is made specifically with reference to the accompanying drawings and examples It is bright.
Embodiment one
The embodiment of the present invention provides a kind of design method of OpenCL runtime systems, as shown in figure 1, methods described bag Include:
101st, OpenCL runtime system frameworks are divided into functional layer, optimization layer and platform by OpenCL runtime systems Layer.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
For the embodiment of the present invention, as shown in Fig. 2 OpenCL runtime system frameworks are divided into functional layer, optimization layer and put down 3 layers of platform layer.Wherein, functional layer is used to receive OpenCL runtime codes, and realizes its basic function;Optimization layer is used to implement entirely System optimization.In the embodiment of the present invention, platform-independent parts and platform dependent portion are included in functional layer and optimization layer.Its In, platform-independent parts are the same section of each different platform in heterogeneous system, are indicated with the unrelated IR of platform;Platform phase The different piece that part is each different platform in heterogeneous system is closed, is indicated with platform correlation IR.
102nd, OpenCL runtime systems provide podium level IR to functional layer and optimization layer, and are realized at least one platform Podium level is provided and realizes framework.
Wherein, OpenCL runtime systems to the podium level IR that functional layer and optimization layer provide include framework manager and Accelerator manager, framework manager and accelerator manager include corresponding method and description.Implement in the present invention In example, podium level realizes that framework can be used for rapidly providing corresponding podium level IR for the new platform of OpenCL runtime systems Specific implementation.
For the embodiment of the present invention, by the way that OpenCL runtime systems are divided into functional layer, optimization layer and podium level, and Podium level IR interfaces are used in optimization layer, enable to optimization uniformly to be implemented in optimization layer, in cross-platform transplanting When, can directly it be transplanted by optimization layer, so as to realize that it is real that an optimization can be carried out in each different platform Apply, and then the platform development complexity of heterogeneous system can be reduced;Meanwhile when introducing new platform, it is only necessary to real according to podium level Existing framework configures functional layer in podium level IR, the OpenCL runtime system of the new platform and optimization layer varies without, from And can realize in cross-platform transplanting, maximally utilise the correlation technique of the OpenCL runtime systems of existing platform, And then the step that can improve reduces the platform development complexity of heterogeneous system.
For the embodiment of the present invention, OpenCL runtime system frameworks go in arbitrary heterogeneous computing system. Wherein, heterogeneous computing system can be that different instruction set, different micro-structurals or different computing capability equipment mutually interconnect Connect formed system.In the embodiment of the present invention, it can be that Tilera many-cores platform or ARM+x86 are different that different platforms, which is realized, Structure platform etc..
Further, the specific implementation as method shown in Fig. 1, the embodiments of the invention provide during a kind of OpenCL operations The design device of system, as shown in figure 3, the entity of described device can be OpenCL runtime systems, described device includes:Draw Subdivision 31, provide unit 32.
Division unit 31, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
Unit 32 is provided, for the functional layer to division unit division 31 and the optimization layer offer podium level IR, and to At least one platform realizes that providing podium level realizes framework.
It should be noted that each function list in the design device of the OpenCL runtime systems provided in the embodiment of the present invention Other corresponding descriptions corresponding to member, may be referred to the corresponding description in Fig. 1, will not be repeated here.
Yet further, it is when the entity of the design device of the OpenCL runtime systems can be OpenCL operations System, as shown in figure 4, the OpenCL runtime systems can include:Processor 41, input equipment 42, output equipment 43, storage Device 44, the input block 42, output unit 43 and memory 44 are connected with processor 41 respectively.
Processor 41, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
Processor 41, it is additionally operable to provide the podium level IR to functional layer and optimization layer, and is realized at least one platform Podium level is provided and realizes framework.
It should be noted that other in the OpenCL runtime systems provided in the embodiment of the present invention corresponding to each equipment Corresponding description, the corresponding description in Fig. 1 is may be referred to, will not be repeated here.
The design method and device of OpenCL runtime systems framework provided in an embodiment of the present invention, first transport OpenCL System framework is divided into functional layer, optimization layer and podium level during row, and podium level includes podium level intermediate representation IR, podium level is realized Framework and at least one platform are realized, then provide podium level IR to functional layer and optimization layer, and realize at least one platform Podium level is provided and realizes framework.Compared with the executable code of the said firm's product is produced by the IR of different company at present, this hair Bright embodiment can be realized same excellent by the way that OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level Change is directly implemented in different platform, and only needs to develop podium level when introducing new platform, different so as to reduce The platform development complexity of construction system.
Embodiment two
The embodiment of the present invention provides a kind of design method of OpenCL runtime systems framework, as shown in figure 5, methods described Including:
501st, OpenCL runtime system frameworks are divided into functional layer, optimization layer and platform by OpenCL runtime systems Layer.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
For the embodiment of the present invention, as shown in Fig. 2 OpenCL runtime system frameworks are divided into functional layer, optimization layer and put down 3 layers of platform layer.Wherein, functional layer is used to receive OpenCL runtime codes, and realizes its basic function;Optimization layer is used to implement entirely System optimization.In the embodiment of the present invention, platform-independent parts and platform dependent portion are included in functional layer and optimization layer.Its In, platform-independent parts are the same section of each different platform in heterogeneous system, are indicated with the unrelated IR of platform;Platform phase The different piece that part is each different platform in heterogeneous system is closed, is indicated with platform correlation IR.
Wherein, podium level IR includes framework manager manager and accelerator manager.
For the embodiment of the present invention, specific steps when OpenCL is run include:Striding equipment data transfer is carried out first, so Start kernel afterwards to perform, finally carry out kernel and perform parallel.For example, in CPU+GPU heterogeneous system, OpenCL operations When specific steps include:CPU first transmits input data to GPU, and CPU is back to CPU by structure is calculated, and then CPU will Kernel is transmitted on GPU and performed, and numerous parallel computation units performs the kernel on last GPU.Wherein, striding equipment Data transfer is performed as the related part of tissue between flat equipments on stage to starting kernel, and this part can be abstracted as pipe Manage device manager;It is the part related to the internal structure of acceleration equipment on platform that kernel is performed parallel, and this part can take out As for accelerator manager.
For the embodiment of the present invention, framework manager can describe the membership credentials between equipment on heterogeneous platform, including Memory organization and control planning;Accelerator manager can describe the hardware characteristics of each acceleration equipment on heterogeneous platform, including Accelerator code building, parallel organization and storage hierarchy.Wherein, OpenCL runtime systems framework includes a framework Manager and at least one accelerator manager.In embodiments of the present invention, by by platform abstraction be framework manager and Calculator manager, unified podium level IR can be provided for OpenCL runtime systems, it is unified so as to build one OpenCL runtime system frameworks.
Wherein, podium level IR includes podium level IR method and podium level IR description, and podium level IR method is included by force Podium level IR processed suggests podium level IR.
Specifically, method can include corresponding to framework manager:launch、malloc、men_read/mem_write Deng;Description can include corresponding to framework manager:The connection framework of accelerator number, CPU and accelerator, accelerator title/ Type, accelerator function/speciality, accelerator state etc..In embodiments of the present invention, podium level IR can be provided by force for functional layer Podium level IR processed, provided for optimization layer and suggest podium level IR.
Specifically, method can include corresponding to accelerator manager:code_gen、local_malloc、local_ Read/local_write, barrier etc.;Description can include corresponding to accelerator manager:Storage hierarchy, parallel organization, SIMD(Single Instruction Multiple Data, single-instruction multiple-data stream (SIMD))Width etc..
502nd, OpenCL runtime systems provide to functional layer and force podium level IR.
For the embodiment of the present invention, method attribute corresponding to the podium level IR provided for functional layer is to force, to cause Functional layer can directly carry out realizations of the podium level IR to concrete function.For example, the pressure podium level IR provided to functional layer is When method attribute is the malloc/read/write stored on compulsory acceleration equipment glabal/local, can directly it use Malloc/read/write on accelerator facility, to realize corresponding function.
For the embodiment of the present invention, podium level IR is forced by being provided to functional layer, can force to realize that functional layer is corresponding Various functions, so as to realize that OpenCL runtime systems are translated fully according to the program that user writes.
503rd, OpenCL runtime systems provide to optimization layer suggests podium level IR.
For the embodiment of the present invention, method attribute corresponding to the podium level IR provided for optimization layer is suggestion, is being realized During need to carry out packaging integration, to allow optimization layer to be realized according to actual hardware resource.Specifically, suggestion is worked as When the hardware resource that podium level IR takes is enough, podium level IR is directly realized by;When the hardware resource for suggesting podium level IR occupancy When insufficient, according to its corresponding priority, the low podium level IR of Release priority level, until the high suggestion podium level IR of priority is accounted for When resource is enough, each podium level IR is realized according to the order of priority from high to low.
For the embodiment of the present invention, suggest podium level IR by being provided to optimization layer, can be according to each suggestion podium level Priority corresponding to IR difference, sequentially realizes each podium level IR, so as to be realized according to the situation of actual hardware resource Each podium level, and then can avoid because hardware resource deficiency causes the situation of system fault.
504th, OpenCL runtime systems are priority corresponding to the IR configurations of suggestion podium level.
For the embodiment of the present invention, by suggest priority corresponding to podium level IR configurations, to build according to each Priority corresponding to discussing podium level IR difference, sequentially realizes each podium level IR.
505th, OpenCL runtime systems generating platform layer IR.
Specifically, step 505 can be that OpenCL runtime systems firstly generate podium level IR description, then judge Whether the resource shared by podium level IR method is less than or equal to available resources, if the money shared by podium level IR method Source is less than or equal to the method for available resources, then generating platform layer IR;If the resource shared by podium level IR method is more than Available resources, then according to priority corresponding to podium level IR, the resource shared by the minimum podium level IR of Release priority level.
For the embodiment of the present invention, for podium level IR description section, OpenCL runtime systems directly generate;For Podium level IR method part, after OpenCL runtime systems to it by carrying out packaging integration first, it is flat then to generate this Platform layer IR.
For the embodiment of the present invention, OpenCL runtime system frameworks go in arbitrary heterogeneous computing system. Wherein, heterogeneous computing system can be that different instruction set, different micro-structurals or different computing capability equipment mutually interconnect Connect formed system.In the embodiment of the present invention, it can be that Tilera many-cores platform or ARM+x86 are different that different platforms, which is realized, Structure platform etc..
For example, when heterogeneous system includes Tilera many-cores platform and ARM+x86 heterogeneous platforms, functional layer and optimization layer The IR interfaces of reception include:
clCreateBuffer(name,size)
{arch_mgr->malloc(get_device(),name,size);
buffer_list.push(name,size,ptr);
return TURE;}
clBuildProgram
{kernel_source.read();
options.read();
device_mgr->code_gen();
kernel_obj_list.push(kernel);
return TURE;}
clEnqueueNDRangeKernel(kernel,work_group_size)
{device_mgr->set_parallel(get_device(),work_group_size);
arch_mgr->launch(get_device(),kernel_obj_list.pop());
return TURE;}
Wherein, podium level IR includes:arch_mgr->malloc、device_mgr->code_gen、device_mgr-> set_parallel、arch_mgr->launch;The unrelated IR of podium level includes:buffer_list.push、kernel_ source.read、options.read、kernel_obj_list.push、kernel_obj_list.pop.Now, podium level Method corresponding to framework manager includes in IR:Description includes corresponding to malloc and launch, framework manager:Link Structure and accelerator title;Method corresponding to accelerator manager includes in podium level IR:Code_gen and set_parallel, Description includes corresponding to accelerator manager:Parallel organization and storage hierarchy.
Specifically, for Tilera many-cores platform and ARM+x86 heterogeneous platforms, description section in podium level IR it is specific Platform is realized as shown in the table:
Podium level IR:Description Tilera many-core platforms ARM+x86 heterogeneous platforms
Framework manager:Connect framework Nothing PCIe
Framework manager:Accelerator title tilera x86
Accelerator manager:Parallel organization One layer:36 cores One layer:The threads of 8 core * 2
Accelerator manager:Storage hierarchy 36GMem/L1-L2cache 16G Mem/L1-L3cache
For the embodiment of the present invention, the platform specific of the method part in podium level IR is realized as shown in the table:
For the embodiment of the present invention, by the way that OpenCL runtime systems are divided into functional layer, optimization layer and podium level, and Podium level IR interfaces are used in optimization layer, enable to optimization uniformly to be implemented in optimization layer, in cross-platform transplanting When, can directly it be transplanted by optimization layer, so as to realize that it is real that an optimization can be carried out in each different platform Apply, and then the platform development complexity of heterogeneous system can be reduced;Meanwhile when introducing new platform, it is only necessary to real according to podium level Existing framework configures functional layer in podium level IR, the OpenCL runtime system of the new platform and optimization layer varies without, from And can realize in cross-platform transplanting, maximally utilise the correlation technique of the OpenCL runtime systems of existing platform, And then the step that can improve reduces the platform development complexity of heterogeneous system.
Further, the specific implementation as method shown in Fig. 5, the embodiments of the invention provide during a kind of OpenCL operations The design device of system, as shown in fig. 6, the entity of described device can be OpenCL runtime systems, described device includes:Draw Subdivision 61, provide unit 62.
Division unit 61, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
Unit 62 is provided, the functional layer and optimization layer for being divided to division unit 61 provide podium level IR, and at least One platform realizes that providing podium level realizes framework.
There is provided the podium level IR that unit 62 provides includes framework manager manager and accelerator manager.
There is provided the podium level IR that unit 62 provides includes podium level IR method and podium level IR description.
Wherein, podium level IR method includes forcing podium level IR or suggests podium level IR.
Unit 62 is provided, podium level IR is forced specifically for being provided to functional layer.
Unit 62 is provided, is specifically additionally operable to provide to optimization layer and suggests podium level IR.
Alternatively, described device can also include:Dispensing unit 63.
Dispensing unit 63, for for suggest podium level IR configuration corresponding to priority.
Alternatively, described device can also include:Generation unit 64.
Generation unit 64, for generating platform layer IR.
Generation unit 64 includes:Generation module 6401, judge module 6402, release module 6403.
Generation module 6401, the description for generating platform layer IR.
Judge module 6402, for judging whether the resource shared by podium level IR method is less than or equal to available money Source.
Generation module 6401, the resource for being additionally operable to judge shared by podium level IR method when judge module 6402 be less than or When person is equal to available resources, generating platform layer IR method.
Release module 6403, for judging that it is available that the resource shared by podium level IR method is more than when judge module 6402 During resource, according to priority corresponding to podium level IR, the resource shared by the minimum podium level IR of Release priority level.
It should be noted that each work(in the design device of the OpenCL runtime system frameworks provided in the embodiment of the present invention Other corresponding descriptions corresponding to energy unit, may be referred to the corresponding description in Fig. 5, will not be repeated here.
Yet further, when the entity of the design device of the OpenCL runtime systems framework can be OpenCL operations System, as shown in fig. 7, the OpenCL runtime systems can include:Processor 71, input equipment 72, output equipment 73, deposit Reservoir 74, the input equipment 72, output equipment 73 and memory 74 are connected with processor 71 respectively.
Processor 71, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
Processor 71, it is additionally operable to provide podium level IR to functional layer and optimization layer, and realizes and provide at least one platform Podium level realizes framework.
The podium level IR that processor 71 provides includes framework manager manager and accelerator manager.
The podium level IR that processor 71 provides includes podium level IR method and podium level IR description.
Wherein, podium level IR method includes forcing podium level IR or suggests podium level IR.
Processor 71, podium level IR is forced specifically for being provided to functional layer.
Processor 71, specifically it is additionally operable to provide to optimization layer and suggests podium level IR.
Processor 71, it is additionally operable to suggest priority corresponding to podium level IR configurations.
Processor 71, it is additionally operable to generating platform layer IR.
Processor 71, it is additionally operable to generating platform layer IR description.
Processor 71, is additionally operable to judge whether the resource shared by podium level IR method is less than or equal to available money Source.
Processor 71, it is additionally operable to when the resource shared by podium level IR method is less than or equal to available resources, it is raw Into podium level IR method.
Processor 71, it is additionally operable to when the resource shared by podium level IR method is more than available resources, according to podium level Priority corresponding to IR, the resource shared by the minimum podium level IR of Release priority level.
It should be noted that other in the OpenCL runtime systems provided in the embodiment of the present invention corresponding to each equipment Corresponding description, the corresponding description in Fig. 5 is may be referred to, will not be repeated here.
The design method and device of OpenCL runtime systems framework provided in an embodiment of the present invention, first transport OpenCL System framework is divided into functional layer, optimization layer and podium level during row, and podium level includes podium level intermediate representation IR, podium level is realized Framework and at least one platform are realized, then provide podium level IR to functional layer and optimization layer, and realize at least one platform Podium level is provided and realizes framework.Compared with the executable code of the said firm's product is produced by the IR of different company at present, this hair Bright embodiment can be realized same excellent by the way that OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level Change is directly implemented in different platform, and only needs to develop podium level when introducing new platform, different so as to reduce The platform development complexity of construction system.
The design device of OpenCL runtime systems framework provided in an embodiment of the present invention can realize the side of above-mentioned offer Method embodiment, concrete function are realized the explanation referred in embodiment of the method, will not be repeated here.It is provided in an embodiment of the present invention The design method and device of OpenCL runtime system frameworks go for carrying out cross-platform transplanting in heterogeneous system, but not It is only limitted to this.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only memory(Read-Only Memory, ROM)Or random access memory(Random Access Memory, RAM)Deng.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, the change or replacement that can readily occur in, all should It is included within the scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.

Claims (10)

  1. A kind of 1. design method of open computing language OpenCL runtime system frameworks, it is characterised in that including:
    OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level, the functional layer is used to receive OpenCL runtime codes, and realize its basic function;The optimization layer is used to implement global optimum, is used in optimization layer Podium level intermediate representation IR interfaces;
    The podium level includes podium level IR, podium level realizes framework and at least one platform is realized;
    There is provided to the functional layer and force podium level IR;
    There is provided to the optimization layer and suggest podium level IR;
    For priority corresponding to the suggestion podium level IR configurations;
    Realize that providing the podium level realizes framework at least one platform.
  2. 2. the design method of OpenCL runtime systems framework according to claim 1, it is characterised in that the podium level IR includes framework manager manager and accelerator manager.
  3. 3. the design method of OpenCL runtime systems framework according to claim 2, it is characterised in that the podium level IR includes the method for the podium level IR and the description of the podium level IR, and the method for the podium level IR includes forcing podium level IR suggests podium level IR.
  4. 4. the design method of OpenCL runtime systems framework according to any one of claims 1 to 3, it is characterised in that institute State after realizing the step of offer podium level realizes framework at least one platform, in addition to:
    Generate the podium level IR.
  5. 5. the design method of OpenCL runtime systems framework according to claim 4, it is characterised in that the generation institute The step of stating podium level IR includes:
    Generate the description of the podium level IR;
    Judge whether the resource shared by the method for the podium level IR is less than or equal to available resources;
    If the resource shared by the method for the podium level IR is less than or equal to available resources, generate the podium level IR's Method;
    If the resource shared by the method for the podium level IR is more than available resources, according to preferential corresponding to the podium level IR Level, the resource shared by the minimum podium level IR of Release priority level.
  6. A kind of 6. design device of open computing language OpenCL runtime system frameworks, it is characterised in that including:
    Division unit, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level, the functional layer For receiving OpenCL runtime codes, and realize its basic function;The optimization layer is used to implement global optimum, is optimizing Podium level intermediate representation IR interfaces are used in layer;
    The podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized;
    Unit is provided, the functional layer for being divided to the division unit, which provides, forces podium level IR;To the optimization layer There is provided and suggest podium level IR;For priority corresponding to the suggestion podium level IR configurations;Realize and carry at least one platform Framework is realized for the podium level.
  7. 7. the design device of OpenCL runtime systems framework according to claim 6, it is characterised in that
    The podium level IR for providing unit offer includes framework manager manager and accelerator manager.
  8. 8. the design device of OpenCL runtime systems framework according to claim 7, it is characterised in that
    The description that the podium level IR that unit provides is provided and includes the method and the podium level IR of the podium level IR, The method of the podium level IR includes forcing podium level IR or suggests podium level IR.
  9. 9. according to the design device of any described OpenCL runtime system frameworks of claim 6 to 8, it is characterised in that institute Stating device also includes:Generation unit;
    The generation unit, for generating the podium level IR.
  10. 10. the design device of OpenCL runtime systems framework according to claim 9, it is characterised in that the generation Unit includes:Generation module, judge module, release module;
    The generation module, for generating the description of the podium level IR;
    The judge module, for judging whether the resource shared by the method for the podium level IR is less than or equal to available money Source;
    The generation module, the resource for being additionally operable to judge shared by the method for the podium level IR when the judge module be less than or When person is equal to available resources, the method that generates the podium level IR;
    The release module, for judging that it is available that the resource shared by the method for the podium level IR is more than when the judge module During resource, according to priority corresponding to the podium level IR, the resource shared by the minimum podium level IR of Release priority level.
CN201410065503.7A 2014-02-25 2014-02-25 The design method and device of OpenCL runtime system frameworks Active CN104866295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410065503.7A CN104866295B (en) 2014-02-25 2014-02-25 The design method and device of OpenCL runtime system frameworks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410065503.7A CN104866295B (en) 2014-02-25 2014-02-25 The design method and device of OpenCL runtime system frameworks

Publications (2)

Publication Number Publication Date
CN104866295A CN104866295A (en) 2015-08-26
CN104866295B true CN104866295B (en) 2018-03-06

Family

ID=53912148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410065503.7A Active CN104866295B (en) 2014-02-25 2014-02-25 The design method and device of OpenCL runtime system frameworks

Country Status (1)

Country Link
CN (1) CN104866295B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631866B (en) * 2015-12-24 2019-04-05 武汉鸿瑞达信息技术有限公司 A kind of extraction calculation optimization method of the foreground target method based on heterogeneous platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102959504A (en) * 2011-03-29 2013-03-06 英特尔公司 Method and apparatus to facilitate shared pointers in a heterogeneous platform
CN103064657A (en) * 2012-12-26 2013-04-24 深圳中微电科技有限公司 Method and device for achieving multi-application parallel processing on single processors
EP2677424A2 (en) * 2012-06-22 2013-12-25 Altera Corporation OpenCL compilation
CN103593220A (en) * 2012-06-22 2014-02-19 阿尔特拉公司 OPENCL compilation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130141443A1 (en) * 2011-12-01 2013-06-06 Michael L. Schmit Software libraries for heterogeneous parallel processing platforms

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102959504A (en) * 2011-03-29 2013-03-06 英特尔公司 Method and apparatus to facilitate shared pointers in a heterogeneous platform
EP2677424A2 (en) * 2012-06-22 2013-12-25 Altera Corporation OpenCL compilation
CN103593220A (en) * 2012-06-22 2014-02-19 阿尔特拉公司 OPENCL compilation
CN103064657A (en) * 2012-12-26 2013-04-24 深圳中微电科技有限公司 Method and device for achieving multi-application parallel processing on single processors

Also Published As

Publication number Publication date
CN104866295A (en) 2015-08-26

Similar Documents

Publication Publication Date Title
CN104965761B (en) A kind of more granularity divisions of string routine based on GPU/CPU mixed architectures and dispatching method
CN102831011B (en) A kind of method for scheduling task based on many core systems and device
CN110119311A (en) A kind of distributed stream computing system accelerated method based on FPGA
CN100456230C (en) Computing group structure for superlong instruction word and instruction flow multidata stream fusion
CN102508712B (en) Middleware system of heterogeneous multi-core reconfigurable hybrid system and task execution method thereof
CN104536937B (en) Big data all-in-one machine realization method based on CPU GPU isomeric groups
JP6103647B2 (en) Processor system and accelerator
CN103279445A (en) Computing method and super-computing system for computing task
CN103226540B (en) Based on multi-region structured grid CFD accelerated method on the GPU of grouping multithread
CN104094224B (en) Method and device for para-virtualized asymmetric gpu processors
CN106164881A (en) Work in heterogeneous computing system is stolen
CN107346351A (en) For designing FPGA method and system based on the hardware requirement defined in source code
CN109656861A (en) A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus
Gordon et al. Novel computer architectures and quantum chemistry
US20160198000A1 (en) Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence
CN101799762A (en) Quick parallelization programming template method for remote sensing image processing algorithm
CN107004253A (en) The application programming interface framework based on figure with equivalence class for enhanced image procossing concurrency
CN108093652A (en) The simulation of application
Zhou et al. Multi-GPU implementation of a 3D finite difference time domain earthquake code on heterogeneous supercomputers
Posadas et al. Automatic synthesis of embedded SW for evaluating physical implementation alternatives from UML/MARTE models supporting memory space separation
CN104866295B (en) The design method and device of OpenCL runtime system frameworks
Peng et al. Cloud computing model based on MPI and OpenMP
CN107239334B (en) Handle the method and device irregularly applied
CN105430074A (en) Data dependency and access traffic based cloud data allocation storage optimization method and system
Chang et al. Enabling PoCL-based runtime frameworks on the HSA for OpenCL 2.0 support

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant