CN104866295B - The design method and device of OpenCL runtime system frameworks - Google Patents
The design method and device of OpenCL runtime system frameworks Download PDFInfo
- Publication number
- CN104866295B CN104866295B CN201410065503.7A CN201410065503A CN104866295B CN 104866295 B CN104866295 B CN 104866295B CN 201410065503 A CN201410065503 A CN 201410065503A CN 104866295 B CN104866295 B CN 104866295B
- Authority
- CN
- China
- Prior art keywords
- podium level
- level
- podium
- platform
- framework
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The embodiment of the invention discloses a kind of design method and device of OpenCL runtime systems framework, it is related to areas of information technology, the platform development complexity of heterogeneous system can be reduced.Methods described includes:OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level first, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized, then podium level IR is provided to functional layer and optimization layer, and realizes that providing podium level realizes framework at least one platform.The embodiment of the present invention heterogeneous system suitable for carrying out cross-platform transplanting.
Description
Technical field
The present invention relates to areas of information technology, the more particularly to a kind of design method and dress of OpenCL runtime systems framework
Put.
Background technology
It is enterprising in each different isomerization platform according to general programmed method as isomerization hardware system is increasingly becoming main flow
Row programming gradually becomes more and more important.Wherein, isomerization hardware system is mainly CPU(Central processing unit,
Central processing unit)+GPU(Graphic Processing Unit, image processor)Isomerization hardware system.Specifically, different
In construction system, pass through OpenCL(Open Computing Language, open computing language)Multiple programming framework, writing can
With the program performed on corresponding platform.
At present, in OpenCL runtime systems, by first by OpenCL kernel(Operating system nucleus)Compiler
Produce IR(Intermediate Representation, intermediate representation), then operationally, IR is produced on different product
Executable code, so as to realize to cross-platform support.For example, CAL caused by AMD OpenCLkernel compilers
(Compute Abstraction Layer, calculate level of abstraction)IR can produce the executable code on AMD different products;
LLVM caused by Intel OpenCL kernel compilers(Low Level Virtual Machine, low level virtual machine)
IR can produce the executable code on Intel different products;PTX caused by NVIDIA OpenCL kernel compilers
(Parallel Thread Execution, parallel thread perform)The executable code on NVIDIA different products can be produced.
However, when producing the executable code on different product by IR at present, due to the OpenCL systems of different company
Framework is different, therefore according to IR caused by the OpenCL kernel compilers of a certain company, can only support the isomery of the said firm
Platform, so as to cause the introducing of implementation and new platform of the same optimization in different platform to be both needed to develop again, and then cause different
The platform development complexity of construction system is higher.
The content of the invention
The embodiment of the present invention provides a kind of design method and device of OpenCL runtime systems framework, can reduce isomery
The platform development complexity of system.
The technical scheme that the embodiment of the present invention uses for:
In a first aspect, the embodiment of the present invention provides a kind of design method of OpenCL runtime systems framework, including:
OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level, the podium level includes platform
Layer intermediate representation IR, podium level realize that framework and at least one platform are realized;
The podium level IR is provided to the functional layer and the optimization layer, and realizes and provides at least one platform
The podium level realizes framework.
With reference in a first aspect, in the first possible implementation of first aspect, the podium level IR includes framework
Manager manager and accelerator manager.
With reference to the possible implementation of the first of first aspect or first aspect, second in first aspect may
Implementation in, the podium level IR including podium level IR method and the description of the podium level IR, the platform
Layer IR method includes forcing podium level IR or suggests podium level IR.
Second with reference to first aspect either the first possible implementation or first aspect of first aspect can
Can implementation, it is described to the functional layer and the optimization layer in the third possible implementation of first aspect
The step of providing the podium level IR includes:
There is provided to the functional layer and force podium level IR;
There is provided to the optimization layer and suggest podium level IR;
After the step of offer suggestion podium level IR to the optimization layer, in addition to:
For priority corresponding to the suggestion podium level IR configurations.
Second with reference to first aspect either the first possible implementation or first aspect of first aspect can
The implementation of energy, or the third possible implementation of first aspect, in the 4th kind of possible realization of first aspect
In mode, it is described realized at least one platform the step of podium level realizes framework is provided after, in addition to:
Generate the podium level IR.
With reference to the 4th kind of possible implementation of first aspect, in the 5th kind of possible implementation of first aspect
In, include the step of the generation podium level IR:
Generate the description of the podium level IR;
Judge whether the resource shared by the method for the podium level IR is less than or equal to available resources;
If the resource shared by the method for the podium level IR is less than or equal to available resources, the podium level is generated
IR method;
If the resource shared by the method for the podium level IR is more than available resources, according to corresponding to the podium level IR
Priority, the resource shared by the minimum podium level IR of Release priority level.
Second aspect, the embodiment of the present invention provide a kind of design device of OpenCL runtime systems framework, including:
Division unit, it is described flat for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level
Platform layer includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized;
Unit is provided, the podium level is provided for the functional layer divided to the division unit and the optimization layer
IR, and realize that providing the podium level realizes framework at least one platform.
With reference to second aspect, in the first possible implementation of second aspect,
The podium level IR for providing unit offer includes framework manager manager and accelerator manager.
With reference to the possible implementation of the first of second aspect or second aspect, second in second aspect may
Implementation in,
The method and the podium level IR of the podium level IR that unit offer is provided including the podium level IR
Description, the method for the podium level IR include forcing podium level IR or suggest podium level IR.
Second with reference to second aspect either the first possible implementation or second aspect of second aspect can
The implementation of energy, in the third possible implementation of second aspect,
The offer unit, podium level IR is forced specifically for being provided to the functional layer;
The offer unit, specifically it is additionally operable to provide to the optimization layer and suggests podium level IR;
Described device also includes:Dispensing unit;
The dispensing unit, for suggesting priority corresponding to podium level IR configurations to be described.
Second with reference to second aspect either the first possible implementation or second aspect of second aspect can
The implementation of energy, or the third possible implementation of second aspect, in the 4th kind of possible realization of second aspect
In mode, described device also includes:Generation unit;
The generation unit, for generating the podium level IR.
With reference to the 4th kind of possible implementation of second aspect, in the 5th kind of possible implementation of second aspect
In, the generation unit includes:Generation module, judge module, release module;
The generation module, for generating the description of the podium level IR;
The judge module, can for judging whether resource shared by the method for the podium level IR is less than or equal to
Use resource;
The generation module, it is additionally operable to when the judge module judges that the resource shared by the method for the podium level IR is small
When available resources, the method that generates the podium level IR;
The release module, for judging that the resource shared by the method for the podium level IR is more than when the judge module
During available resources, according to priority corresponding to the podium level IR, the resource shared by the minimum podium level IR of Release priority level.
The design method and device of OpenCL runtime systems framework provided in an embodiment of the present invention, first transport OpenCL
System framework is divided into functional layer, optimization layer and podium level during row, and podium level includes podium level intermediate representation IR, podium level is realized
Framework and at least one platform are realized, then provide podium level IR to functional layer and optimization layer, and realize at least one platform
Podium level is provided and realizes framework.Compared with the executable code of the said firm's product is produced by the IR of different company at present, this hair
Bright embodiment can be realized same excellent by the way that OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level
Change is directly implemented in different platform, and only needs to develop podium level when introducing new platform, different so as to reduce
The platform development complexity of construction system.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
In the required accompanying drawing used be briefly described, it should be apparent that, drawings in the following description be only the present invention some
Embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also be attached according to these
Figure obtains other accompanying drawings.
Fig. 1 is a kind of design method flow chart for OpenCL runtime systems framework that the embodiment of the present invention one provides;
Fig. 2 is a kind of OpenCL runtime systems framework that the embodiment of the present invention one provides;
Fig. 3 is a kind of structural representation of the design device for OpenCL runtime systems framework that the embodiment of the present invention one provides
Figure;
Fig. 4 is a kind of structural representation for OpenCL runtime systems that the embodiment of the present invention one provides;
Fig. 5 is a kind of design method flow chart for OpenCL runtime systems framework that the embodiment of the present invention two provides;
Fig. 6 is a kind of structural representation of the design device for OpenCL runtime systems framework that the embodiment of the present invention two provides
Figure;
Fig. 7 is a kind of structural representation for OpenCL runtime systems that the embodiment of the present invention two provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained all other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
The advantages of to make technical solution of the present invention, is clearer, and the present invention is made specifically with reference to the accompanying drawings and examples
It is bright.
Embodiment one
The embodiment of the present invention provides a kind of design method of OpenCL runtime systems, as shown in figure 1, methods described bag
Include:
101st, OpenCL runtime system frameworks are divided into functional layer, optimization layer and platform by OpenCL runtime systems
Layer.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
For the embodiment of the present invention, as shown in Fig. 2 OpenCL runtime system frameworks are divided into functional layer, optimization layer and put down
3 layers of platform layer.Wherein, functional layer is used to receive OpenCL runtime codes, and realizes its basic function;Optimization layer is used to implement entirely
System optimization.In the embodiment of the present invention, platform-independent parts and platform dependent portion are included in functional layer and optimization layer.Its
In, platform-independent parts are the same section of each different platform in heterogeneous system, are indicated with the unrelated IR of platform;Platform phase
The different piece that part is each different platform in heterogeneous system is closed, is indicated with platform correlation IR.
102nd, OpenCL runtime systems provide podium level IR to functional layer and optimization layer, and are realized at least one platform
Podium level is provided and realizes framework.
Wherein, OpenCL runtime systems to the podium level IR that functional layer and optimization layer provide include framework manager and
Accelerator manager, framework manager and accelerator manager include corresponding method and description.Implement in the present invention
In example, podium level realizes that framework can be used for rapidly providing corresponding podium level IR for the new platform of OpenCL runtime systems
Specific implementation.
For the embodiment of the present invention, by the way that OpenCL runtime systems are divided into functional layer, optimization layer and podium level, and
Podium level IR interfaces are used in optimization layer, enable to optimization uniformly to be implemented in optimization layer, in cross-platform transplanting
When, can directly it be transplanted by optimization layer, so as to realize that it is real that an optimization can be carried out in each different platform
Apply, and then the platform development complexity of heterogeneous system can be reduced;Meanwhile when introducing new platform, it is only necessary to real according to podium level
Existing framework configures functional layer in podium level IR, the OpenCL runtime system of the new platform and optimization layer varies without, from
And can realize in cross-platform transplanting, maximally utilise the correlation technique of the OpenCL runtime systems of existing platform,
And then the step that can improve reduces the platform development complexity of heterogeneous system.
For the embodiment of the present invention, OpenCL runtime system frameworks go in arbitrary heterogeneous computing system.
Wherein, heterogeneous computing system can be that different instruction set, different micro-structurals or different computing capability equipment mutually interconnect
Connect formed system.In the embodiment of the present invention, it can be that Tilera many-cores platform or ARM+x86 are different that different platforms, which is realized,
Structure platform etc..
Further, the specific implementation as method shown in Fig. 1, the embodiments of the invention provide during a kind of OpenCL operations
The design device of system, as shown in figure 3, the entity of described device can be OpenCL runtime systems, described device includes:Draw
Subdivision 31, provide unit 32.
Division unit 31, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
Unit 32 is provided, for the functional layer to division unit division 31 and the optimization layer offer podium level IR, and to
At least one platform realizes that providing podium level realizes framework.
It should be noted that each function list in the design device of the OpenCL runtime systems provided in the embodiment of the present invention
Other corresponding descriptions corresponding to member, may be referred to the corresponding description in Fig. 1, will not be repeated here.
Yet further, it is when the entity of the design device of the OpenCL runtime systems can be OpenCL operations
System, as shown in figure 4, the OpenCL runtime systems can include:Processor 41, input equipment 42, output equipment 43, storage
Device 44, the input block 42, output unit 43 and memory 44 are connected with processor 41 respectively.
Processor 41, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
Processor 41, it is additionally operable to provide the podium level IR to functional layer and optimization layer, and is realized at least one platform
Podium level is provided and realizes framework.
It should be noted that other in the OpenCL runtime systems provided in the embodiment of the present invention corresponding to each equipment
Corresponding description, the corresponding description in Fig. 1 is may be referred to, will not be repeated here.
The design method and device of OpenCL runtime systems framework provided in an embodiment of the present invention, first transport OpenCL
System framework is divided into functional layer, optimization layer and podium level during row, and podium level includes podium level intermediate representation IR, podium level is realized
Framework and at least one platform are realized, then provide podium level IR to functional layer and optimization layer, and realize at least one platform
Podium level is provided and realizes framework.Compared with the executable code of the said firm's product is produced by the IR of different company at present, this hair
Bright embodiment can be realized same excellent by the way that OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level
Change is directly implemented in different platform, and only needs to develop podium level when introducing new platform, different so as to reduce
The platform development complexity of construction system.
Embodiment two
The embodiment of the present invention provides a kind of design method of OpenCL runtime systems framework, as shown in figure 5, methods described
Including:
501st, OpenCL runtime system frameworks are divided into functional layer, optimization layer and platform by OpenCL runtime systems
Layer.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
For the embodiment of the present invention, as shown in Fig. 2 OpenCL runtime system frameworks are divided into functional layer, optimization layer and put down
3 layers of platform layer.Wherein, functional layer is used to receive OpenCL runtime codes, and realizes its basic function;Optimization layer is used to implement entirely
System optimization.In the embodiment of the present invention, platform-independent parts and platform dependent portion are included in functional layer and optimization layer.Its
In, platform-independent parts are the same section of each different platform in heterogeneous system, are indicated with the unrelated IR of platform;Platform phase
The different piece that part is each different platform in heterogeneous system is closed, is indicated with platform correlation IR.
Wherein, podium level IR includes framework manager manager and accelerator manager.
For the embodiment of the present invention, specific steps when OpenCL is run include:Striding equipment data transfer is carried out first, so
Start kernel afterwards to perform, finally carry out kernel and perform parallel.For example, in CPU+GPU heterogeneous system, OpenCL operations
When specific steps include:CPU first transmits input data to GPU, and CPU is back to CPU by structure is calculated, and then CPU will
Kernel is transmitted on GPU and performed, and numerous parallel computation units performs the kernel on last GPU.Wherein, striding equipment
Data transfer is performed as the related part of tissue between flat equipments on stage to starting kernel, and this part can be abstracted as pipe
Manage device manager;It is the part related to the internal structure of acceleration equipment on platform that kernel is performed parallel, and this part can take out
As for accelerator manager.
For the embodiment of the present invention, framework manager can describe the membership credentials between equipment on heterogeneous platform, including
Memory organization and control planning;Accelerator manager can describe the hardware characteristics of each acceleration equipment on heterogeneous platform, including
Accelerator code building, parallel organization and storage hierarchy.Wherein, OpenCL runtime systems framework includes a framework
Manager and at least one accelerator manager.In embodiments of the present invention, by by platform abstraction be framework manager and
Calculator manager, unified podium level IR can be provided for OpenCL runtime systems, it is unified so as to build one
OpenCL runtime system frameworks.
Wherein, podium level IR includes podium level IR method and podium level IR description, and podium level IR method is included by force
Podium level IR processed suggests podium level IR.
Specifically, method can include corresponding to framework manager:launch、malloc、men_read/mem_write
Deng;Description can include corresponding to framework manager:The connection framework of accelerator number, CPU and accelerator, accelerator title/
Type, accelerator function/speciality, accelerator state etc..In embodiments of the present invention, podium level IR can be provided by force for functional layer
Podium level IR processed, provided for optimization layer and suggest podium level IR.
Specifically, method can include corresponding to accelerator manager:code_gen、local_malloc、local_
Read/local_write, barrier etc.;Description can include corresponding to accelerator manager:Storage hierarchy, parallel organization,
SIMD(Single Instruction Multiple Data, single-instruction multiple-data stream (SIMD))Width etc..
502nd, OpenCL runtime systems provide to functional layer and force podium level IR.
For the embodiment of the present invention, method attribute corresponding to the podium level IR provided for functional layer is to force, to cause
Functional layer can directly carry out realizations of the podium level IR to concrete function.For example, the pressure podium level IR provided to functional layer is
When method attribute is the malloc/read/write stored on compulsory acceleration equipment glabal/local, can directly it use
Malloc/read/write on accelerator facility, to realize corresponding function.
For the embodiment of the present invention, podium level IR is forced by being provided to functional layer, can force to realize that functional layer is corresponding
Various functions, so as to realize that OpenCL runtime systems are translated fully according to the program that user writes.
503rd, OpenCL runtime systems provide to optimization layer suggests podium level IR.
For the embodiment of the present invention, method attribute corresponding to the podium level IR provided for optimization layer is suggestion, is being realized
During need to carry out packaging integration, to allow optimization layer to be realized according to actual hardware resource.Specifically, suggestion is worked as
When the hardware resource that podium level IR takes is enough, podium level IR is directly realized by;When the hardware resource for suggesting podium level IR occupancy
When insufficient, according to its corresponding priority, the low podium level IR of Release priority level, until the high suggestion podium level IR of priority is accounted for
When resource is enough, each podium level IR is realized according to the order of priority from high to low.
For the embodiment of the present invention, suggest podium level IR by being provided to optimization layer, can be according to each suggestion podium level
Priority corresponding to IR difference, sequentially realizes each podium level IR, so as to be realized according to the situation of actual hardware resource
Each podium level, and then can avoid because hardware resource deficiency causes the situation of system fault.
504th, OpenCL runtime systems are priority corresponding to the IR configurations of suggestion podium level.
For the embodiment of the present invention, by suggest priority corresponding to podium level IR configurations, to build according to each
Priority corresponding to discussing podium level IR difference, sequentially realizes each podium level IR.
505th, OpenCL runtime systems generating platform layer IR.
Specifically, step 505 can be that OpenCL runtime systems firstly generate podium level IR description, then judge
Whether the resource shared by podium level IR method is less than or equal to available resources, if the money shared by podium level IR method
Source is less than or equal to the method for available resources, then generating platform layer IR;If the resource shared by podium level IR method is more than
Available resources, then according to priority corresponding to podium level IR, the resource shared by the minimum podium level IR of Release priority level.
For the embodiment of the present invention, for podium level IR description section, OpenCL runtime systems directly generate;For
Podium level IR method part, after OpenCL runtime systems to it by carrying out packaging integration first, it is flat then to generate this
Platform layer IR.
For the embodiment of the present invention, OpenCL runtime system frameworks go in arbitrary heterogeneous computing system.
Wherein, heterogeneous computing system can be that different instruction set, different micro-structurals or different computing capability equipment mutually interconnect
Connect formed system.In the embodiment of the present invention, it can be that Tilera many-cores platform or ARM+x86 are different that different platforms, which is realized,
Structure platform etc..
For example, when heterogeneous system includes Tilera many-cores platform and ARM+x86 heterogeneous platforms, functional layer and optimization layer
The IR interfaces of reception include:
clCreateBuffer(name,size)
{arch_mgr->malloc(get_device(),name,size);
buffer_list.push(name,size,ptr);
return TURE;}
clBuildProgram
{kernel_source.read();
options.read();
device_mgr->code_gen();
kernel_obj_list.push(kernel);
return TURE;}
clEnqueueNDRangeKernel(kernel,work_group_size)
{device_mgr->set_parallel(get_device(),work_group_size);
arch_mgr->launch(get_device(),kernel_obj_list.pop());
return TURE;}
Wherein, podium level IR includes:arch_mgr->malloc、device_mgr->code_gen、device_mgr->
set_parallel、arch_mgr->launch;The unrelated IR of podium level includes:buffer_list.push、kernel_
source.read、options.read、kernel_obj_list.push、kernel_obj_list.pop.Now, podium level
Method corresponding to framework manager includes in IR:Description includes corresponding to malloc and launch, framework manager:Link
Structure and accelerator title;Method corresponding to accelerator manager includes in podium level IR:Code_gen and set_parallel,
Description includes corresponding to accelerator manager:Parallel organization and storage hierarchy.
Specifically, for Tilera many-cores platform and ARM+x86 heterogeneous platforms, description section in podium level IR it is specific
Platform is realized as shown in the table:
Podium level IR:Description | Tilera many-core platforms | ARM+x86 heterogeneous platforms |
Framework manager:Connect framework | Nothing | PCIe |
Framework manager:Accelerator title | tilera | x86 |
Accelerator manager:Parallel organization | One layer:36 cores | One layer:The threads of 8 core * 2 |
Accelerator manager:Storage hierarchy | 36GMem/L1-L2cache | 16G Mem/L1-L3cache |
For the embodiment of the present invention, the platform specific of the method part in podium level IR is realized as shown in the table:
For the embodiment of the present invention, by the way that OpenCL runtime systems are divided into functional layer, optimization layer and podium level, and
Podium level IR interfaces are used in optimization layer, enable to optimization uniformly to be implemented in optimization layer, in cross-platform transplanting
When, can directly it be transplanted by optimization layer, so as to realize that it is real that an optimization can be carried out in each different platform
Apply, and then the platform development complexity of heterogeneous system can be reduced;Meanwhile when introducing new platform, it is only necessary to real according to podium level
Existing framework configures functional layer in podium level IR, the OpenCL runtime system of the new platform and optimization layer varies without, from
And can realize in cross-platform transplanting, maximally utilise the correlation technique of the OpenCL runtime systems of existing platform,
And then the step that can improve reduces the platform development complexity of heterogeneous system.
Further, the specific implementation as method shown in Fig. 5, the embodiments of the invention provide during a kind of OpenCL operations
The design device of system, as shown in fig. 6, the entity of described device can be OpenCL runtime systems, described device includes:Draw
Subdivision 61, provide unit 62.
Division unit 61, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
Unit 62 is provided, the functional layer and optimization layer for being divided to division unit 61 provide podium level IR, and at least
One platform realizes that providing podium level realizes framework.
There is provided the podium level IR that unit 62 provides includes framework manager manager and accelerator manager.
There is provided the podium level IR that unit 62 provides includes podium level IR method and podium level IR description.
Wherein, podium level IR method includes forcing podium level IR or suggests podium level IR.
Unit 62 is provided, podium level IR is forced specifically for being provided to functional layer.
Unit 62 is provided, is specifically additionally operable to provide to optimization layer and suggests podium level IR.
Alternatively, described device can also include:Dispensing unit 63.
Dispensing unit 63, for for suggest podium level IR configuration corresponding to priority.
Alternatively, described device can also include:Generation unit 64.
Generation unit 64, for generating platform layer IR.
Generation unit 64 includes:Generation module 6401, judge module 6402, release module 6403.
Generation module 6401, the description for generating platform layer IR.
Judge module 6402, for judging whether the resource shared by podium level IR method is less than or equal to available money
Source.
Generation module 6401, the resource for being additionally operable to judge shared by podium level IR method when judge module 6402 be less than or
When person is equal to available resources, generating platform layer IR method.
Release module 6403, for judging that it is available that the resource shared by podium level IR method is more than when judge module 6402
During resource, according to priority corresponding to podium level IR, the resource shared by the minimum podium level IR of Release priority level.
It should be noted that each work(in the design device of the OpenCL runtime system frameworks provided in the embodiment of the present invention
Other corresponding descriptions corresponding to energy unit, may be referred to the corresponding description in Fig. 5, will not be repeated here.
Yet further, when the entity of the design device of the OpenCL runtime systems framework can be OpenCL operations
System, as shown in fig. 7, the OpenCL runtime systems can include:Processor 71, input equipment 72, output equipment 73, deposit
Reservoir 74, the input equipment 72, output equipment 73 and memory 74 are connected with processor 71 respectively.
Processor 71, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level.
Wherein, podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized.
Processor 71, it is additionally operable to provide podium level IR to functional layer and optimization layer, and realizes and provide at least one platform
Podium level realizes framework.
The podium level IR that processor 71 provides includes framework manager manager and accelerator manager.
The podium level IR that processor 71 provides includes podium level IR method and podium level IR description.
Wherein, podium level IR method includes forcing podium level IR or suggests podium level IR.
Processor 71, podium level IR is forced specifically for being provided to functional layer.
Processor 71, specifically it is additionally operable to provide to optimization layer and suggests podium level IR.
Processor 71, it is additionally operable to suggest priority corresponding to podium level IR configurations.
Processor 71, it is additionally operable to generating platform layer IR.
Processor 71, it is additionally operable to generating platform layer IR description.
Processor 71, is additionally operable to judge whether the resource shared by podium level IR method is less than or equal to available money
Source.
Processor 71, it is additionally operable to when the resource shared by podium level IR method is less than or equal to available resources, it is raw
Into podium level IR method.
Processor 71, it is additionally operable to when the resource shared by podium level IR method is more than available resources, according to podium level
Priority corresponding to IR, the resource shared by the minimum podium level IR of Release priority level.
It should be noted that other in the OpenCL runtime systems provided in the embodiment of the present invention corresponding to each equipment
Corresponding description, the corresponding description in Fig. 5 is may be referred to, will not be repeated here.
The design method and device of OpenCL runtime systems framework provided in an embodiment of the present invention, first transport OpenCL
System framework is divided into functional layer, optimization layer and podium level during row, and podium level includes podium level intermediate representation IR, podium level is realized
Framework and at least one platform are realized, then provide podium level IR to functional layer and optimization layer, and realize at least one platform
Podium level is provided and realizes framework.Compared with the executable code of the said firm's product is produced by the IR of different company at present, this hair
Bright embodiment can be realized same excellent by the way that OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level
Change is directly implemented in different platform, and only needs to develop podium level when introducing new platform, different so as to reduce
The platform development complexity of construction system.
The design device of OpenCL runtime systems framework provided in an embodiment of the present invention can realize the side of above-mentioned offer
Method embodiment, concrete function are realized the explanation referred in embodiment of the method, will not be repeated here.It is provided in an embodiment of the present invention
The design method and device of OpenCL runtime system frameworks go for carrying out cross-platform transplanting in heterogeneous system, but not
It is only limitted to this.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with
The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium
In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic
Dish, CD, read-only memory(Read-Only Memory, ROM)Or random access memory(Random Access
Memory, RAM)Deng.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, the change or replacement that can readily occur in, all should
It is included within the scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.
Claims (10)
- A kind of 1. design method of open computing language OpenCL runtime system frameworks, it is characterised in that including:OpenCL runtime system frameworks are divided into functional layer, optimization layer and podium level, the functional layer is used to receive OpenCL runtime codes, and realize its basic function;The optimization layer is used to implement global optimum, is used in optimization layer Podium level intermediate representation IR interfaces;The podium level includes podium level IR, podium level realizes framework and at least one platform is realized;There is provided to the functional layer and force podium level IR;There is provided to the optimization layer and suggest podium level IR;For priority corresponding to the suggestion podium level IR configurations;Realize that providing the podium level realizes framework at least one platform.
- 2. the design method of OpenCL runtime systems framework according to claim 1, it is characterised in that the podium level IR includes framework manager manager and accelerator manager.
- 3. the design method of OpenCL runtime systems framework according to claim 2, it is characterised in that the podium level IR includes the method for the podium level IR and the description of the podium level IR, and the method for the podium level IR includes forcing podium level IR suggests podium level IR.
- 4. the design method of OpenCL runtime systems framework according to any one of claims 1 to 3, it is characterised in that institute State after realizing the step of offer podium level realizes framework at least one platform, in addition to:Generate the podium level IR.
- 5. the design method of OpenCL runtime systems framework according to claim 4, it is characterised in that the generation institute The step of stating podium level IR includes:Generate the description of the podium level IR;Judge whether the resource shared by the method for the podium level IR is less than or equal to available resources;If the resource shared by the method for the podium level IR is less than or equal to available resources, generate the podium level IR's Method;If the resource shared by the method for the podium level IR is more than available resources, according to preferential corresponding to the podium level IR Level, the resource shared by the minimum podium level IR of Release priority level.
- A kind of 6. design device of open computing language OpenCL runtime system frameworks, it is characterised in that including:Division unit, for OpenCL runtime system frameworks to be divided into functional layer, optimization layer and podium level, the functional layer For receiving OpenCL runtime codes, and realize its basic function;The optimization layer is used to implement global optimum, is optimizing Podium level intermediate representation IR interfaces are used in layer;The podium level includes podium level intermediate representation IR, podium level realizes framework and at least one platform is realized;Unit is provided, the functional layer for being divided to the division unit, which provides, forces podium level IR;To the optimization layer There is provided and suggest podium level IR;For priority corresponding to the suggestion podium level IR configurations;Realize and carry at least one platform Framework is realized for the podium level.
- 7. the design device of OpenCL runtime systems framework according to claim 6, it is characterised in thatThe podium level IR for providing unit offer includes framework manager manager and accelerator manager.
- 8. the design device of OpenCL runtime systems framework according to claim 7, it is characterised in thatThe description that the podium level IR that unit provides is provided and includes the method and the podium level IR of the podium level IR, The method of the podium level IR includes forcing podium level IR or suggests podium level IR.
- 9. according to the design device of any described OpenCL runtime system frameworks of claim 6 to 8, it is characterised in that institute Stating device also includes:Generation unit;The generation unit, for generating the podium level IR.
- 10. the design device of OpenCL runtime systems framework according to claim 9, it is characterised in that the generation Unit includes:Generation module, judge module, release module;The generation module, for generating the description of the podium level IR;The judge module, for judging whether the resource shared by the method for the podium level IR is less than or equal to available money Source;The generation module, the resource for being additionally operable to judge shared by the method for the podium level IR when the judge module be less than or When person is equal to available resources, the method that generates the podium level IR;The release module, for judging that it is available that the resource shared by the method for the podium level IR is more than when the judge module During resource, according to priority corresponding to the podium level IR, the resource shared by the minimum podium level IR of Release priority level.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410065503.7A CN104866295B (en) | 2014-02-25 | 2014-02-25 | The design method and device of OpenCL runtime system frameworks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410065503.7A CN104866295B (en) | 2014-02-25 | 2014-02-25 | The design method and device of OpenCL runtime system frameworks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104866295A CN104866295A (en) | 2015-08-26 |
CN104866295B true CN104866295B (en) | 2018-03-06 |
Family
ID=53912148
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410065503.7A Active CN104866295B (en) | 2014-02-25 | 2014-02-25 | The design method and device of OpenCL runtime system frameworks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104866295B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631866B (en) * | 2015-12-24 | 2019-04-05 | 武汉鸿瑞达信息技术有限公司 | A kind of extraction calculation optimization method of the foreground target method based on heterogeneous platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102959504A (en) * | 2011-03-29 | 2013-03-06 | 英特尔公司 | Method and apparatus to facilitate shared pointers in a heterogeneous platform |
CN103064657A (en) * | 2012-12-26 | 2013-04-24 | 深圳中微电科技有限公司 | Method and device for achieving multi-application parallel processing on single processors |
EP2677424A2 (en) * | 2012-06-22 | 2013-12-25 | Altera Corporation | OpenCL compilation |
CN103593220A (en) * | 2012-06-22 | 2014-02-19 | 阿尔特拉公司 | OPENCL compilation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130141443A1 (en) * | 2011-12-01 | 2013-06-06 | Michael L. Schmit | Software libraries for heterogeneous parallel processing platforms |
-
2014
- 2014-02-25 CN CN201410065503.7A patent/CN104866295B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102959504A (en) * | 2011-03-29 | 2013-03-06 | 英特尔公司 | Method and apparatus to facilitate shared pointers in a heterogeneous platform |
EP2677424A2 (en) * | 2012-06-22 | 2013-12-25 | Altera Corporation | OpenCL compilation |
CN103593220A (en) * | 2012-06-22 | 2014-02-19 | 阿尔特拉公司 | OPENCL compilation |
CN103064657A (en) * | 2012-12-26 | 2013-04-24 | 深圳中微电科技有限公司 | Method and device for achieving multi-application parallel processing on single processors |
Also Published As
Publication number | Publication date |
---|---|
CN104866295A (en) | 2015-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104965761B (en) | A kind of more granularity divisions of string routine based on GPU/CPU mixed architectures and dispatching method | |
CN102831011B (en) | A kind of method for scheduling task based on many core systems and device | |
CN110119311A (en) | A kind of distributed stream computing system accelerated method based on FPGA | |
CN100456230C (en) | Computing group structure for superlong instruction word and instruction flow multidata stream fusion | |
CN102508712B (en) | Middleware system of heterogeneous multi-core reconfigurable hybrid system and task execution method thereof | |
CN104536937B (en) | Big data all-in-one machine realization method based on CPU GPU isomeric groups | |
JP6103647B2 (en) | Processor system and accelerator | |
CN103279445A (en) | Computing method and super-computing system for computing task | |
CN103226540B (en) | Based on multi-region structured grid CFD accelerated method on the GPU of grouping multithread | |
CN104094224B (en) | Method and device for para-virtualized asymmetric gpu processors | |
CN106164881A (en) | Work in heterogeneous computing system is stolen | |
CN107346351A (en) | For designing FPGA method and system based on the hardware requirement defined in source code | |
CN109656861A (en) | A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus | |
Gordon et al. | Novel computer architectures and quantum chemistry | |
US20160198000A1 (en) | Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence | |
CN101799762A (en) | Quick parallelization programming template method for remote sensing image processing algorithm | |
CN107004253A (en) | The application programming interface framework based on figure with equivalence class for enhanced image procossing concurrency | |
CN108093652A (en) | The simulation of application | |
Zhou et al. | Multi-GPU implementation of a 3D finite difference time domain earthquake code on heterogeneous supercomputers | |
Posadas et al. | Automatic synthesis of embedded SW for evaluating physical implementation alternatives from UML/MARTE models supporting memory space separation | |
CN104866295B (en) | The design method and device of OpenCL runtime system frameworks | |
Peng et al. | Cloud computing model based on MPI and OpenMP | |
CN107239334B (en) | Handle the method and device irregularly applied | |
CN105430074A (en) | Data dependency and access traffic based cloud data allocation storage optimization method and system | |
Chang et al. | Enabling PoCL-based runtime frameworks on the HSA for OpenCL 2.0 support |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |