CN109933327A - OpenCL compiler method and system based on code fusion compiler framework - Google Patents
OpenCL compiler method and system based on code fusion compiler framework Download PDFInfo
- Publication number
- CN109933327A CN109933327A CN201910106880.3A CN201910106880A CN109933327A CN 109933327 A CN109933327 A CN 109933327A CN 201910106880 A CN201910106880 A CN 201910106880A CN 109933327 A CN109933327 A CN 109933327A
- Authority
- CN
- China
- Prior art keywords
- code
- kernel
- thread
- compiler
- syntax tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000004458 analytical method Methods 0.000 claims abstract description 74
- 238000005457 optimization Methods 0.000 claims abstract description 56
- 238000005206 flow analysis Methods 0.000 claims abstract description 33
- 230000008569 process Effects 0.000 claims abstract description 24
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 238000013519 translation Methods 0.000 claims description 7
- 241000208340 Araliaceae Species 0.000 claims description 6
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 6
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 6
- 235000008434 ginseng Nutrition 0.000 claims description 6
- 230000001133 acceleration Effects 0.000 abstract description 16
- 230000006870 function Effects 0.000 description 51
- 230000000875 corresponding effect Effects 0.000 description 26
- 230000001276 controlling effect Effects 0.000 description 11
- 230000008859 change Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000010276 construction Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013433 optimization analysis Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Landscapes
- Devices For Executing Special Programs (AREA)
Abstract
The present invention relates to a kind of OpenCL compiler method and systems based on code fusion compiler framework, it include: to provide host-kernel code fusion compiler framework based on shared drive, in the intermediate representation of compiler --- the fusion of different end codes is realized on AST layer;WII-CFG figure is used to portray after Kernel code is instantiated into numerous threads, the instruction execution behavior between thread, that is, analyzes the program process performing of working group's inner platform feature-sensitive;The united data-flow analysis of host-kernel code, for excavating the data flow relation crossed between host side or the data flow relation and thread at kernel end, to analyze the data dependence between the code of both ends;Implement targetedly code optimization based on Such analysis, and generates assembly code and then terminate compilation process.The present invention can carry out analysis simultaneously for host side code, Kernel code, sufficiently excavate cross-thread and optimize chance, so that OpenCL program obtains good performance portability towards different acceleration equipments.
Description
Technical field
The present invention relates to the research and development of compiler and optimisation technique field, more particularly to one kind towards OpenCL language and different
The compiler framework design and compiler method and system of structure platform.
Background technique
Isomery framework has become mainstream framework in recent years, either before ranking on global TOP500 supercomputer list
Three be all heterogeneous platform, heterogeneous platform quantity are more than, or occur extensively from processor core+acceleration equipment framework on 100
On server, PC, terminal device, this point is all embodied.Heterogeneous computing system is usually by CPU and one or more
Acceleration equipment is connected with each other composition on piece or mainboard, and wherein CPU is responsible for the work such as complicated control, scheduling, and accelerates to set
Calculating task that is standby then being responsible for Large-scale parallel computing or professional domain.For isomerism parallel programming model, NVIDIA company
The OpenCL of the CUDA and Khronos Group publication of release is two kinds of isomerism parallel programming models of current mainstream, wherein after
Person can be applied to the cross-platform parallel programming model of a variety of acceleration equipments, has and is more widely applied range than the former.
Under heterogeneous Computing frame, object code isomery is a significant challenge of compilation tool design.Isomery program
Code is divided into the end host (Host) and equipment (Device) holds two parts, the end CPU and acceleration equipment end is correspondingly operated in, from function
The former is responsible for data initialization, data exchange and acceleration equipment control, the latter and is responsible for parallel execution core calculations, that is, this on energy
The compiling target platform of two parts code be it is inconsistent, its optimization aim is also not quite similar.Current isomerism parallel program is adopted
With the mode of separate compilation, independently compiles and optimize the code of operation on different devices.Under separate compilation mode, difference is set
Standby compilation tool is mutually indepedent, can generate abundant optimized code for distinct device.Such as NVIDIA CUDA compiling system
It unites nvcc (NVIDIA Compiler Collection), most of successful commercializations such as AMD OpenCL compiling/operation frame
Compiling system, be all based on this separate compilation method design.
However separate compilation mode has ignored the association between isomery code.Such as in OpenCL program, mainframe code
It is completely independent with kernel code, without shared compile-time message.But in fact, mainframe code is controlled by calling OpenCL API
The operation of kernel code processed, to be interacted with acceleration equipment.And due to being to be completely independent compiling when compiling kernel code, compiling
Device will be unable to know the relevant information of mainframe code, such as the incoming information of parameter, the layout information of array, working group
(workgroup) information etc. limits the optimization chance of kernel code, is unfavorable for improving the quality for generating code.For isomery
The code compilation of parallel computation frame and optimization, " separation " are that compiler will consider the problems of always with " fusion ".On the one hand most
Whole code operates on heterogeneous platform, needs separately to compile the code at different ends and be added additional complex mechanism (including chain
Pick system, operating mechanism etc.);On the other hand, there are correlation between the code at different ends, need to know that these information could be right
Code carries out depth optimization.From the depth optimization angle of code, fusion compiling is very necessary.
For the target of isomery code depth optimization and improvement OpenCL performance portability, this method provides a kind of master
Machine-kernel code fusion compiling OpenCL compiler method, provides the program generation after optimization using conversion regime in a steady stream
Code, it is intended to which fusion compiling mainframe code and kernel code are excavated in thread and thread with realizing the analysis and optimization of Whole Process grade
Between optimization chance, while cross-platform formedness is provided for program for the portable poor feature of OpenCL program feature
It can be portable.It is with previous work difference, proposes host-kernel code fusion compiler framework and its construction method, and base
Two compiler infrastructures are proposed in this --- modeling work item execution sequence, platform features relevant WII-CFG figure and main
The united data-flow analysis of machine-kernel code, for instructing the specific aim of kernel code to optimize.The compiler that this method is related to is set
Meter includes four major parts: (1) providing host-kernel code based on shared drive and merge compiler framework, in compiler
Between indicate --- the fusion of different end codes is realized on AST layer;(2) WII-CFG schemes (Work-Item Interleaving
CFG) it is used to portray after Kernel code is instantiated into numerous threads, the instruction execution behavior between thread, that is, analyzes work
The program process performing of work group inner platform feature-sensitive;(3) the united data-flow analysis of host-kernel code, for excavate across
Data flow relation more between the data flow relation and thread of host side or kernel end, to analyze host side code and Kernel
Data dependence between code;(4) targetedly code optimization is implemented based on Such analysis, and generates assembly code and then ties
Beam compilation process.
Summary of the invention
For OpenCL program, performance portability difference is the major issue being widely noticed, and we have proposed bases thus
In host-kernel code fusion compiler framework compiler method, include WII-CFG figure and host-kernel code joint
Two compiler infrastructures of data-flow analysis, it is intended to for OpenCL program provide depth optimization basis and good performance can
Transplantability.In order to analyze the optimization chance between thread (or Work-Item), this method is directed to a working group (Work-
Group thread deployment analysis and optimization in).
Specifically, the invention discloses a kind of OpenCL compiler method based on code fusion compiler framework,
Including:
Step 1 obtains OpenCL source program, is host abstract syntax tree by host side code compilation in the source program, obtains
The kernel code file for taking kernel run function in the abstract syntax tree compiles the kernel code file and obtains kernel abstract syntax
Tree, and it is deposited into shared drive, all kernel abstract syntax tree are fetched and reconstructed from the shared drive, and obtaining fusion should
The fusion abstract syntax tree of host abstract syntax tree and the kernel abstract syntax tree;
Step 2 obtains the host abstract syntax tree and the kernel abstract syntax tree respectively based on the fusion abstract syntax tree
Controlling stream graph, and increase function call while and function return while instruct the controlling stream graph of both connections, obtain inline control and flow
It is suitable to obtain execution of the instruction on respective objects platform in the work item of kernel according to the WII function of target platform feature for figure
Sequence, connection controlling stream graph portrays this and executes sequence inside, obtains WII-CFG figure;
Step 3 passes data transmission OpenCL between ginseng and host side and equipment end by the function for analyzing kernel code
The parameter that api function calls obtains the corresponding relationship between host side variable and kernel variable as the first analysis as a result, and right
WII-CFG figure carries out data-flow analysis, obtains the second analysis result;
Step 4, according to this first analysis result and this second analysis as a result, to kernel code in the fusion abstract syntax tree
It optimizes, obtains optimization abstract syntax tree;
Step 5 will export mainframe code and kernel code after optimizing after the compiled device translation of the optimization abstract syntax tree
As compiling result.
The OpenCL compiler method based on code fusion compiler framework, wherein step 2 includes: in
Thread executive mode on the target platform of core code obtains the WII function of the target platform, for calculating work item in kernel
Instruction execute sequence on the target platform.
The OpenCL compiler method based on code fusion compiler framework, wherein the step 3 specifically includes:
Corresponding relationship and host side and equipment end between the incoming argument variable and parameter variable of analysis kernel function
Between the parameter called of data transmission OpenCL api function, obtain corresponding relationship between host side variable and kernel variable and make
For first analysis as a result, to the WII-CFG scheme carry out data-flow analysis, obtain the second analysis as a result, include host side code and
Definition-between the different variables of kernel code uses chain and active period.
The OpenCL compiler method based on code fusion compiler framework wherein includes that this is excellent in step 4
Change specifically includes:
Thread merges step, uses chain according to the definition-in the second analysis result, identifies the redundant operation of cross-thread,
The multiple threads for executing the redundant operation are merged into a coarseness thread, to reduce the code redundancy of cross-thread;
Data layout step, it is flat using chain and target according to the definition-in the first analysis result, the second analysis result
The thread tissue executive mode of platform, from continuous in thread or cross-thread it is continuous in a kind of preferentially layout, and implement code conversion;
Vectorization step, according to this second analysis result in active period and definition-use chain, vectorization cross-thread and line
Code in journey.
The OpenCL compiler method based on code fusion compiler framework, wherein further include: step 6 is incited somebody to action
The compiling result is run after calling local compiler compiling according to OpenCL compilation process.
The invention also discloses a kind of OpenCL compiler systems based on code fusion compiler framework, wherein wrapping
It includes:
Module 1 obtains OpenCL source program, is host abstract syntax tree by host side code compilation in the source program, obtains
The kernel code file for taking kernel run function in the abstract syntax tree compiles the kernel code file and obtains kernel abstract syntax
Tree, and it is deposited into shared drive, all kernel abstract syntax tree are fetched and reconstructed from the shared drive, and obtaining fusion should
The fusion abstract syntax tree of host abstract syntax tree and the kernel abstract syntax tree;
Module 2 obtains the host abstract syntax tree and the kernel abstract syntax tree respectively based on the fusion abstract syntax tree
Controlling stream graph, and increase function call while and function return while instruct the controlling stream graph of both connections, obtain inline control and flow
It is suitable to obtain execution of the instruction on respective objects platform in the work item of kernel according to the WII function of target platform feature for figure
Sequence, connection controlling stream graph portrays this and executes sequence inside, obtains WII-CFG figure;
Module 3 passes data transmission OpenCL between ginseng and host side and equipment end by the function for analyzing kernel code
The parameter that api function calls obtains the corresponding relationship between host side variable and kernel variable as the first analysis as a result, and right
WII-CFG figure carries out data-flow analysis, obtains the second analysis result;
Module 4, according to this first analysis result and this second analysis as a result, to kernel code in the fusion abstract syntax tree
It optimizes, obtains optimization abstract syntax tree;
Module 5 will export mainframe code and kernel code after optimizing after the compiled device translation of the optimization abstract syntax tree
As compiling result.
The OpenCL compiler system based on code fusion compiler framework, wherein module 2 includes: in
Thread executive mode on the target platform of core code obtains the WII function of the target platform, for calculating work item in kernel
Instruction execute sequence on the target platform.
The OpenCL compiler system based on code fusion compiler framework, wherein the module 3 specifically includes:
Corresponding relationship and host side and equipment end between the incoming argument variable and parameter variable of analysis kernel function
Between the parameter called of data transmission OpenCL api function, obtain corresponding relationship between host side variable and kernel variable and make
For first analysis as a result, to the WII-CFG scheme carry out data-flow analysis, obtain the second analysis as a result, include host side code and
Definition-between the different variables of kernel code uses chain and active period.
The OpenCL compiler system based on code fusion compiler framework wherein includes that this is excellent in module 4
Change specifically includes:
Thread merging module uses chain according to the definition-in the second analysis result, identifies the redundant operation of cross-thread,
The multiple threads for executing the redundant operation are merged into a coarseness thread, to reduce the code redundancy of cross-thread;
Data layout module is executed according to the first analysis result, this definition-using the thread tissue of chain and target platform
Mode, from continuous in thread or cross-thread it is continuous in a kind of preferentially layout, and implement code conversion;
Vectorization module, according to this second analysis result in active period and definition-use chain, vectorization cross-thread and line
Code in journey.
The OpenCL compiler system based on code fusion compiler framework, wherein further include: module 6 is incited somebody to action
The compiling result is run after calling local compiler compiling according to OpenCL compilation process.
Technical effect of the invention includes:
OpenCL compiler method of the invention, cover improved compiler framework, extension analytical technology and be directed to
Property optimization means, analysis can be carried out simultaneously for host side code, Kernel code towards different acceleration equipments, sufficiently hair
It digs cross-thread and optimizes chance, so that OpenCL program obtains good performance portability.
Detailed description of the invention
Fig. 1 is each platform WII function chart;
Fig. 2 is WII-CFG figure;
Variable corresponding relationship chart of the Fig. 3 between host side and Kernel code;
Fig. 4 is compilation process flow chart.
Specific embodiment
In order to solve the above-mentioned technical problem, embodiment of the present invention includes:
A. host-kernel code fusion: firstly, the compiled device of host side code generates intermediate representation-abstract syntax tree
AST(HostAST).Then, the AST is traversed, when encountering kernel run function (such as clCreateProgramWithSource
Function) when, know kernel code filename, and start subprocess and compiler compiling kernel code file is called to obtain
KernelAST is stored in shared drive and terminates subprocess.Again, it is fetched from shared drive buffer area and reconstructs all kernels
The AST of code, so that HostAST and KernelAST realize fusion.
B. the control flow analysis based on WII-CFG figure: purpose is towards specific (Kernel code) target platform
The WII-CFG figure of building fusion post code, provides basis for subsequent data-flow analysis and code optimization.Firstly, based on above-mentioned
Fused AST constructs inline controlling stream graph (CFG, control flow graph), and which show HostAST, (host is taken out
As syntax tree) and KernelAST (kernel abstract syntax tree) respective CFG (construction method is constructed with traditional CFG), and increase
Calledge (function call is existed) and return edge (when function return) is connected to the CFG of the two.Then, according to Kernel
Thread executive mode on the target platform of code, i.e. thread in working group (WorkGroup) are executed one by one in a manner of serializing,
Or several threads execute in a parallel fashion, WII (Work-Item Interleaving) function of the platform are obtained, based on
It calculates certain instruction in some work item (Work-Item) of kernel and executes sequence on respective objects platform.Again, it is based on
WII function refines CFG, and that portrays Kernel instruction on it executes sequence, thus obtains WII-CFG figure.
C. united data-flow analysis: firstly, analyzing the data dependence between host side code and Kernel code
(or corresponding relationship of data).Corresponding pass between incoming argument variable and parameter variable by analyzing Kernel function
System, and analyze the transmission of these incoming argument variables relevant data (i.e. relevant OpenCL api function calls, such as
ClEnqueueWriteBuffer, clEnqueueReadBuffer etc.) parameter, host side variable and Kernel can be obtained
Corresponding relationship between variable is as the first analysis result (being considered as alias relationship in the present invention).Secondly, being adopted on WII-CFG figure
With traditional dataflow analysis method, carry out host side-united data-flow analysis of equipment end code, including host side code and
The alias relationship between variable, definition-between the different variables of Kernel code and between different threads use chain, active period
Analysis etc..
D. it code optimization: utilizes the result of Such analysis to carry out code optimization, improves Kernel code performance.Firstly, line
Journey merges optimization and is intended to for several threads to be merged into a coarseness thread, reduces the code redundancy of cross-thread.Through aforementioned data
The definition-of variable uses chain between the different threads that flow point is analysed, and may recognize that the redundant operation of cross-thread, and also exactly thread closes
And optimization object.Secondly, data layout optimization is intended to the thread tissue executive mode according to target platform, from two kinds of data cloth
A kind of office --- preferentially layout during continuous in thread or cross-thread is continuous, and implement code conversion.Through aforementioned data-flow analysis
Alias relationship and definition-between obtained host side code and the variable of Kernel code use chain, and it is legal to can be used for instructing
Data layout code conversion.Again, radically vectorization optimization be intended to in cross-thread, thread code carry out vectorization it is excellent
Change.Its code conversion be related to the definition of correlated variables, the sentence used change, also depend on data-flow analysis obtain it is accurate
Definition-analyze result using chain and active period.
E. code building and rear compilation process: therefrom isolating mainframe code and kernel code to the fusion AST after optimization,
Mainframe code and kernel code (the OpenCL program source code after optimizing) after our compiler translation after output optimization.
These subsequent codes routinely OpenCL compilation process can call local compiler compiling, generate binary and then run.
To allow features described above and effect of the invention that can illustrate more clearly understandable, special embodiment below, and cooperate
Bright book attached drawing is described in detail below.
Whole flow process figure of the present invention includes: as shown in Figure 4
Step 1, the AST of fusion is generated.OpenCL source program is inputted, obtains host-kernel generation after carrying out fusion compiling
The AST of code fusion.Firstly, the compiled device of host side code generates intermediate representation --- abstract syntax tree AST (HostAST).So
Afterwards, the AST is traversed, when encountering kernel run function (such as clCreateProgramWithSource function), knows kernel
Code file name, and open up shared memory space and communicated for this process with its subprocess;Then, starting subprocess calls compiling
Device compiling kernel code file obtains KernelAST, is stored in shared drive and terminates subprocess.Again, from shared memory space
In fetch the AST of all kernel codes, so far this process can access Host AST and Kernel AST simultaneously, realize two
The fusion of AST.Wherein shared drive is one of mode of interprocess communication, and shared drive allows two or more processes shared one
A given memory block, this section of memory block can be mapped in the address space of itself by two or more processes,
The information of one process write-in shared drive, can be used the process of this shared drive by other, simple interior by one
Read operation reading is deposited, thus the communication between realizing process.
Step 2, the control flow analysis based on WII-CFG figure.Purpose is towards specific (Kernel code) target
Platform construction merges the WII-CFG figure of post code, provides basis for subsequent data-flow analysis and code optimization.Include:
2.1) inline CFG figure is constructed based on fusion AST above-mentioned.Inline control is constructed based on above-mentioned fused AST
Flow graph (inlined-CFG figure) processed, which show the respective CFG of Host AST and Kernel AST, (construction method is the same as traditional
CFG building), meanwhile, Kernel code is by the starting of host side code (or calling), and there is passes that is called and calling
System then increases calledge (function call is existed) and return edge (when function return) for Host-CFG and Kernel-
CFG is connected.
2.2) WII (Work-Item Interleaving) function is obtained according to target platform feature, for calculating
Certain instruction executes sequence on respective objects platform in some Work-Item of kernel.In the same Work-Group
Work-Item execution sequence is Platform Dependent, it is most common there are two types of --- serializing executes and data parallelization executes, preceding
Person be each Work-Item successively execute (Work-Item0 is finished, just start execute Work-Item1.), with Advanced Micro Devices
The TileGX series many-core chip of CPU, Tilera company, domestic Shen prestige many-core chip (SW26010) be representative;The latter is phase
Instruction is executed in parallel (Work-Item0 in adjacent several Work-Item.The respective insn0 of Work-Itemi is held parallel
After row, Work-Item0 is just executed.The respective insn1 of Work-Itemi, then executes respective insn2.), such as
The NVIDIA-GPU chip of SIMT mode, adjacent thread instruction are changed into the Intel of vector instruction execution automatically at runtime
CPU, XeonPhi chip.Specifically as shown in Figure 1, wherein tid indicates thread number, by OpenCL Specification, thread
Tid at most there are three dimension tid (0), tid (1), tid (2), tid be tid (0), tid (1), tid (2) be calculated it is complete
Exchange line journey id.
2.3) CFG is refined based on WII function, i.e., simple instantiation extension is carried out on Kernel CFG, by Kernel
Static instruction example is melted into the instruction in thread relevant to thread tid, and indicates its execution to thread instruction according to WII function
Sequentially, WII-CFG figure is obtained.As shown in Fig. 2, Fig. 2 is WII-CFG figure, wherein (a) is that inline CFG schemes;It (b) is serializing
Kernel target platform on WII-CFG figure;(c) (degree of parallelism is schemed for WII-CFG on the Kernel target platform of data parallel
For 2), inline CFG figure is refined according to the instruction execution sequence that WII function obtains, obtain executing platform towards serializing
And data-oriented parallelization execute platform WII-CFG scheme (as shown in Fig. 2).
Step 3, united data-flow analysis.Firstly, analysis obtains corresponding relationship between host side and the variable of Kernel,
Then traditional data-flow analysis is carried out on WII-CFG figure, is specifically included:
3.1) corresponding relationship (this of variable in host side variable (including aray variable or array pointer) and Kernel is obtained
In also referred to as alias relationship).Phase is transmitted by the corresponding relationship and data of the incoming argument and parameter of analyzing Kernel function
The parameter that the OpenCL api function of pass calls, including clEnqueueWriteBuffer (), clEnqueueReadBuffer
The data such as (), clEnqueueMapBuffer (), clSetKernelArg () function transmit correlation function, are mainly directed towards
Thus the argument variable being passed in Kernel code knows its corresponding host side variable.
It illustrates.Analyze source code (shown in such as Fig. 3 (a)), then:
(1) the incoming argument and parameter for analyzing Kernel function, can obtain following corresponding relationship:
D_f<->ker (0th) (=f);D_p<->ker (1th) (=p);
D_n<->ker (2th) (=n);NN<->ker (3th) (=N);
NA<->ker (4th) (=A);
(2) analysis data transmit the parameter of related api function, can obtain following corresponding relationship:
d_n<->h_n;d_p<->h_p;h_f<->d_f;
To obtain the variable corresponding relationship of host side and Kernel code.As shown in Fig. 3 (b).
Wherein corresponding relationship such as d_f<->ker (0th) (=f) means that incoming argument d_f is equivalent to parameter ker (0th)
(=f), i.e. symbol<->mean " being equivalent to ".
3.2) traditional dataflow analysis method is used on WII-CFG figure, carries out the united data of host side-equipment end
Flow point analysis, including host side, equipment end code in alias relationship between variable between variable and between different threads, fixed
Justice-is using chain, active period analysis etc., convenient for optimizations such as subsequent Develop Data layouts.
Still by taking Fig. 3 as an example, know that n (in Kernel code) is corresponding with h_n, d_n based on analysis result 3.1)
Relationship, then practical through its defining point known to data-flow analysis is assignment in host side code to h_n.Such data-flow analysis
As a result it is conducive to subsequent optimization analysis and code conversion.
Step 4, code optimization.Code optimization is carried out using the result of Such analysis, improves the feasibility of Kernel code
Energy.The specific aim optimization of main newly-increased three kinds of improvement performance portability:
4.1) thread merges optimization.Variable-definition-between the different threads obtained through aforementioned data-flow analysis uses chain, can
Identify the redundant operation of cross-thread.It, can be by selectively merging for code local redundancy existing for these cross-threads
(it is assumed that selection merges in j dimension, Work-Group includes (localsize (0) * local to the adjacent cf thread of certain dimension
(1) (2) * local) a Work-Item, then have: cf≤localsize (j) and cf >=1) a thread, do not influencing degree of parallelism
Under correlated performance, the calculating or memory access or simultaneously operating of redundancy are removed, code performance is improved.In host side code and Kernel generation
Code makes corresponding modification.
4.2) data layout optimization.According to it is aforementioned 2.2), by the feature of acceleration equipment be broadly divided into serializing execute sum number
Two kinds are executed according to parallelization, thus also according to equipment feature from two kinds of data layouts --- continuous in thread (suitable serializing is held
Row) or cross-thread continuous (data parallelization is suitble to execute) in preferentially go out a kind of layout, and accordingly modification host side code and
Related array or definition and use (information obtained using data-flow analysis, including host side generation of variable in Kernel code
Code, Kernel code variable between alias relationship and definition-use chain).
Still by taking Fig. 3 code as an example, when the acceleration equipment executed towards data parallelization, Kernel code should use cross-thread
Continuous data layout, and the use (idx=n [tid+j*A] sentence) of the n (in Kernel code) in source code has been line
It is continuous between journey, therefore original data layout need not be changed.When the acceleration equipment executed towards serializing, Kernel code should be adopted
With data layout continuous in thread, and n (in Kernel code) Ying Jinhang data layout optimization in source code, change n's
It (is changed to: idx=n [tid*N+j]) using sentence, while for program correctness, also accordingly modification is (main for practical definition statement
The h_n [i+j*nA] of generator terminal=neighborIter [i] [j] sentence is changed to: h_n [i*nN+j]=neighborIter [i]
[j])。
4.3) radically vectorization optimizes.Numerous threads will be instantiated into the practical execution of Kernel code concomitantly to hold
Row, for vectorization optimization angle, in quantization chance all oriented in cross-thread, thread.According to the SIMD instruction of particular hardware
Width, the automatic vectorization to the advanced row cross-thread code of Kernel code, then carry out the automatic vectorization in thread.Its code
Transformation is related to the definition of correlated variables, the sentence used changes, and also depending on the precise definition-that data-flow analysis obtains makes
Result is analyzed with chain and active period.
Step 5, code building and rear compilation process.Mainframe code and kernel are therefrom isolated to the fusion AST after optimization
Code, mainframe code and kernel code (the OpenCL program after optimizing after our compiler translation after output optimization
Source code).These subsequent codes routinely OpenCL compilation process can call local compiler compiling then to run.
The following are system embodiment corresponding with above method embodiment, present embodiment can be mutual with above embodiment
Cooperation is implemented.The relevant technical details mentioned in above embodiment are still effective in the present embodiment, in order to reduce repetition,
Which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in above embodiment.
The invention also discloses a kind of OpenCL compiler systems based on code fusion compiler framework, wherein wrapping
It includes:
Module 1 obtains OpenCL source program, is host abstract syntax tree by host side code compilation in the source program, obtains
The kernel code file for taking kernel run function in the abstract syntax tree compiles the kernel code file and obtains kernel abstract syntax
Tree, and it is deposited into shared drive, all kernel abstract syntax tree are fetched and reconstructed from the shared drive, and obtaining fusion should
The fusion abstract syntax tree of host abstract syntax tree and the kernel abstract syntax tree;
Module 2 obtains the host abstract syntax tree and the kernel abstract syntax tree respectively based on the fusion abstract syntax tree
Controlling stream graph, and increase function call while and function return while instruct the controlling stream graph of both connections, obtain inline control and flow
It is suitable to obtain execution of the instruction on respective objects platform in the work item of kernel according to the WII function of target platform feature for figure
Sequence, connection controlling stream graph portrays this and executes sequence inside, obtains WII-CFG figure;
Module 3 passes data transmission OpenCL between ginseng and host side and equipment end by the function for analyzing kernel code
The parameter that api function calls obtains the corresponding relationship between host side variable and kernel variable as the first analysis as a result, and right
WII-CFG figure carries out data-flow analysis, obtains the second analysis result;
Module 4, according to this first analysis result and this second analysis as a result, to kernel code in the fusion abstract syntax tree
It optimizes, obtains optimization abstract syntax tree;
The optimization abstract syntax tree is input to compiler by module 5, the mainframe code and kernel after translation after output optimization
Code is as compiling result.
The OpenCL compiler system based on code fusion compiler framework, wherein module 2 includes: in
Thread executive mode on the target platform of core code obtains the WII function of the target platform, for calculating work item in kernel
Instruction execute sequence on the target platform.
The OpenCL compiler system based on code fusion compiler framework, wherein the module 3 specifically includes:
Corresponding relationship and host side and equipment end between the incoming argument variable and parameter variable of analysis kernel function
Between the parameter called of data transmission OpenCL api function, obtain corresponding relationship between host side variable and kernel variable and make
For first analysis as a result, to the WII-CFG scheme carry out data-flow analysis, obtain the second analysis as a result, include host side code and
Definition-between the different variables of kernel code uses chain and active period.
The OpenCL compiler system based on code fusion compiler framework wherein includes that this is excellent in module 4
Change specifically includes:
Thread merging module uses chain according to the definition-in the second analysis result, identifies the redundant operation of cross-thread,
The multiple threads for executing the redundant operation are merged into a coarseness thread, to reduce the code redundancy of cross-thread;Data cloth
Office's module is held according to the definition-in the first analysis result, the second analysis result using the thread tissue of chain and target platform
Line mode, from continuous in thread or cross-thread it is continuous in a kind of preferentially layout, and implement code conversion;Vectorization module, root
According in the second analysis result active period and definition-use chain, code in vectorization cross-thread and thread.
The OpenCL compiler system based on code fusion compiler framework, wherein further include: module 6 is incited somebody to action
The compiling result is run after calling local compiler compiling according to OpenCL compilation process.
Technical effect of the invention includes:
1, host-kernel code merges compiler framework.For OpenCL program, the definition of array or variable and use it
Between often beyond kernel code range, mainframe code further specify work item (Work-Group) organizational parameter (that is,
How many a Work-Item included).Then, depth analysis and optimization OpenCL program, it is necessary to Intrusion Detection based on host end code and kernel
The fusion compiler framework of code.
Technical effect: in the analysis phase of compiler, host side code intermediate representation and Kernel code can be obtained simultaneously
Intermediate representation, and energy while deployment analysis.
2, fused controlling flow graph WII-CFG.The thread tissue side of execution when hardware structure and operation on different acceleration equipments
Formula is variant, this causes the instruction from different threads (i.e. Work-Item) sequentially to have because of acceleration equipment difference in execution
It is different.For target acceleration equipment, we obtain execute sequence of the corresponding WII function for instructing in computational threads, and then with
WII-CFG graph expression host side code CFG, Kernel code CFG, while expressing the instruction execution sequence of different threads.
Technical effect: it can be used as the infrastructure for analyzing cross-thread code process performing on different acceleration equipments.Pass through expansion
Tradition CFG figure is opened up, can indicate the different threads example of host side code CFG and Kernel code CFG and Kernel simultaneously
Instruction execution sequence, feature when embodying the operation of acceleration equipment can excavate cross-thread optimization chance.
3, the united data-flow analysis of host-kernel code.It is the extension based on traditional data stream analysis techniques, extension
There are two aspects: 1) being analyzed by the parameter of biography ginseng and data transmission API to OpenCL, obtain mainframe code variable and equipment
Hold the corresponding relationship between code variables.2) based on the data-flow analysis of WII-CFG, carry out host side-equipment end code joint
Data-flow analysis, it is the alias relationship between variable in the code including different ends between variable and between different threads, fixed
Justice-uses chain, active period analysis etc..Conducive to the optimization for carrying out cross-thread.
Technical effect: can be performed for more than the data-flow analysis of mainframe code range or Kernel code range, can carry out face
Variable-definition-use analysis to multi-threaded code, convenient for carrying out inter-thread data and calculating relevant optimization.
Although the present invention is disclosed with above-described embodiment, specific examples are only used to explain the present invention, is not used to limit
The present invention, any those skilled in the art of the present technique without departing from the spirit and scope of the invention, can make some change and complete
It is kind, therefore the scope of the present invention is subject to claims and its equivalency range person.
Claims (10)
1. a kind of OpenCL compiler method based on code fusion compiler framework characterized by comprising
Step 1 obtains OpenCL source program, is host abstract syntax tree by host side code compilation in the source program, and obtaining should
The kernel code file of kernel run function in abstract syntax tree compiles the kernel code file and obtains kernel abstract syntax tree,
And it is deposited into shared drive, all kernel abstract syntax tree are fetched and reconstructed from the shared drive, obtain merging the master
The fusion abstract syntax tree of machine abstract syntax tree and the kernel abstract syntax tree;
Step 2 obtains the host abstract syntax tree and the respective control of kernel abstract syntax tree based on the fusion abstract syntax tree
Flow graph processed, and increase the controlling stream graph that function call connects the two when returning with function, inline controlling stream graph is obtained, according to mesh
The WII function for marking platform features obtains instruction in the work item of kernel and executes sequence on respective objects platform, in interior joint control
Flow graph processed portrays this and executes sequence, obtains WII-CFG figure;
Step 3 passes data transmission OpenCLAPI between ginseng and host side and equipment end by the function for analyzing kernel code
The parameter of function call obtains the corresponding relationship between host side variable and kernel variable as the first analysis as a result, and to this
WII-CFG figure carries out data-flow analysis, obtains the second analysis result;
Step 4, according to this first analysis result and this second analysis as a result, being carried out to kernel code in the fusion abstract syntax tree
Optimization obtains optimization abstract syntax tree;
Step 5, by the mainframe code after the optimization abstract syntax tree compiled device translation after output optimization and kernel code as
Compile result.
2. the OpenCL compiler method as described in claim 1 based on code fusion compiler framework, which is characterized in that
Step 2 includes: thread executive mode on the target platform according to kernel code, obtains the WII function of the target platform, is used for
The instruction for calculating work item in kernel executes sequence on the target platform.
3. the OpenCL compiler method as claimed in claim 1 or 2 based on code fusion compiler framework, feature exist
In the step 3 specifically includes:
It analyzes between corresponding relationship and host side and the equipment end between the incoming argument variable and parameter variable of kernel function
Data transmit the parameter of OpenCLAPI function call, obtain corresponding relationship between host side variable and kernel variable as the
One analysis obtains the second analysis as a result, including host side code and kernel as a result, to WII-CFG figure progress data-flow analysis
Definition-between the different variables of code uses chain and active period.
4. the OpenCL compiler method as claimed in claim 3 based on code fusion compiler framework, which is characterized in that
Include that the optimization specifically includes in step 4:
Thread merges step, uses chain according to the definition-in the second analysis result, identifies the redundant operation of cross-thread, will hold
Multiple threads of the row redundant operation are merged into a coarseness thread, to reduce the code redundancy of cross-thread;
Data layout step uses chain and target platform according to the definition-in the first analysis result, the second analysis result
Thread tissue executive mode, from continuous in thread or cross-thread it is continuous in a kind of preferentially layout, and implement code conversion;
Vectorization step, according in the second analysis result active period and definition-use chain, in vectorization cross-thread and thread
Code.
5. the OpenCL compiler method as described in claim 1 based on code fusion compiler framework, which is characterized in that
Further include: step 6 is run after the compiling result is called local compiler compiling according to OpenCL compilation process.
6. a kind of OpenCL compiler system based on code fusion compiler framework characterized by comprising
Module 1 obtains OpenCL source program, is host abstract syntax tree by host side code compilation in the source program, and obtaining should
The kernel code file of kernel run function in abstract syntax tree compiles the kernel code file and obtains kernel abstract syntax tree,
And it is deposited into shared drive, all kernel abstract syntax tree are fetched and reconstructed from the shared drive, obtain merging the master
The fusion abstract syntax tree of machine abstract syntax tree and the kernel abstract syntax tree;
Module 2 obtains the host abstract syntax tree and the respective control of kernel abstract syntax tree based on the fusion abstract syntax tree
Flow graph processed, and increase function call while and function return while instruct connection the two controlling stream graph, obtain inline controlling stream graph, root
According to the WII function of target platform feature, obtains instruction in the work item of kernel and execute sequence on respective objects platform, inside
Connection controlling stream graph portrays this and executes sequence, obtains WII-CFG figure;
Module 3 passes data transmission OpenCLAPI between ginseng and host side and equipment end by the function for analyzing kernel code
The parameter of function call obtains the corresponding relationship between host side variable and kernel variable as the first analysis as a result, and to this
WII-CFG figure carries out data-flow analysis, obtains the second analysis result;
Module 4, according to this first analysis result and this second analysis as a result, being carried out to kernel code in the fusion abstract syntax tree
Optimization obtains optimization abstract syntax tree;
The optimization abstract syntax tree is input to compiler by module 5, the mainframe code and kernel code after translation after output optimization
As compiling result.
7. the OpenCL compiler system as claimed in claim 6 based on code fusion compiler framework, which is characterized in that
Module 2 includes: thread executive mode on the target platform according to kernel code, obtains the WII function of the target platform, is used for
The instruction for calculating work item in kernel executes sequence on the target platform.
8. the OpenCL compiler system based on code fusion compiler framework as claimed in claims 6 or 7, feature exist
In the module 3 specifically includes:
It analyzes between corresponding relationship and host side and the equipment end between the incoming argument variable and parameter variable of kernel function
Data transmit the parameter of OpenCLAPI function call, obtain corresponding relationship between host side variable and kernel variable as the
One analysis obtains the second analysis as a result, including host side code and kernel as a result, to WII-CFG figure progress data-flow analysis
Definition-between the different variables of code uses chain and active period.
9. the OpenCL compiler system as claimed in claim 8 based on code fusion compiler framework, which is characterized in that
Include that the optimization specifically includes in module 4:
Thread merging module uses chain according to the definition-in the second analysis result, identifies the redundant operation of cross-thread, will hold
Multiple threads of the row redundant operation are merged into a coarseness thread, to reduce the code redundancy of cross-thread;
Data layout module uses the thread tissue side of execution of chain and target platform according to the first analysis result, this definition-
Formula, from continuous in thread or cross-thread it is continuous in a kind of preferentially layout, and implement code conversion;
Vectorization module, according in the second analysis result active period and definition-use chain, in vectorization cross-thread and thread
Code.
10. the OpenCL compiler system as claimed in claim 6 based on code fusion compiler framework, feature exist
In, further includes: module 6 is run after the compiling result is called local compiler compiling according to OpenCL compilation process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910106880.3A CN109933327B (en) | 2019-02-02 | 2019-02-02 | OpenCL compiler design method and system based on code fusion compiling framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910106880.3A CN109933327B (en) | 2019-02-02 | 2019-02-02 | OpenCL compiler design method and system based on code fusion compiling framework |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109933327A true CN109933327A (en) | 2019-06-25 |
CN109933327B CN109933327B (en) | 2021-01-08 |
Family
ID=66985577
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910106880.3A Active CN109933327B (en) | 2019-02-02 | 2019-02-02 | OpenCL compiler design method and system based on code fusion compiling framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109933327B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111966397A (en) * | 2020-07-22 | 2020-11-20 | 哈尔滨工业大学 | Automatic transplanting and optimizing method for heterogeneous parallel programs |
CN112083956A (en) * | 2020-09-15 | 2020-12-15 | 哈尔滨工业大学 | Heterogeneous platform-oriented automatic management system for complex pointer data structure |
CN112527262A (en) * | 2019-09-19 | 2021-03-19 | 无锡江南计算技术研究所 | Automatic vector optimization method for non-uniform width of deep learning framework compiler |
CN112527304A (en) * | 2019-09-19 | 2021-03-19 | 无锡江南计算技术研究所 | Self-adaptive node fusion compiling optimization method based on heterogeneous platform |
CN112579088A (en) * | 2019-09-27 | 2021-03-30 | 无锡江南计算技术研究所 | Heterogeneous hybrid programming-oriented one-stop program compiling method |
CN116185426A (en) * | 2023-04-17 | 2023-05-30 | 北京大学 | Compiling optimization method and system based on code fusion and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102360306A (en) * | 2011-10-19 | 2012-02-22 | 上海交通大学 | Method for extracting and optimizing information of cyclic data flow charts in high-level language codes |
US20120144376A1 (en) * | 2009-06-02 | 2012-06-07 | Vector Fabrics B.V. | Embedded system development |
CN103677952A (en) * | 2013-12-18 | 2014-03-26 | 华为技术有限公司 | Coder decoder generating device and method |
CN104036141A (en) * | 2014-06-16 | 2014-09-10 | 上海大学 | Open computing language (OpenCL)-based red-black tree acceleration algorithm |
CN104820613A (en) * | 2015-05-27 | 2015-08-05 | 中国科学院自动化研究所 | Compiling method for heterogeneous multi-core routine |
CN106843993A (en) * | 2016-12-26 | 2017-06-13 | 中国科学院计算技术研究所 | A kind of method and system of resolving inversely GPU instructions |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
CN109032572A (en) * | 2017-06-08 | 2018-12-18 | 阿里巴巴集团控股有限公司 | A method of the JAVA program technic based on bytecode is inline |
-
2019
- 2019-02-02 CN CN201910106880.3A patent/CN109933327B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120144376A1 (en) * | 2009-06-02 | 2012-06-07 | Vector Fabrics B.V. | Embedded system development |
CN102360306A (en) * | 2011-10-19 | 2012-02-22 | 上海交通大学 | Method for extracting and optimizing information of cyclic data flow charts in high-level language codes |
CN103677952A (en) * | 2013-12-18 | 2014-03-26 | 华为技术有限公司 | Coder decoder generating device and method |
CN104036141A (en) * | 2014-06-16 | 2014-09-10 | 上海大学 | Open computing language (OpenCL)-based red-black tree acceleration algorithm |
CN104820613A (en) * | 2015-05-27 | 2015-08-05 | 中国科学院自动化研究所 | Compiling method for heterogeneous multi-core routine |
CN106843993A (en) * | 2016-12-26 | 2017-06-13 | 中国科学院计算技术研究所 | A kind of method and system of resolving inversely GPU instructions |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
CN109032572A (en) * | 2017-06-08 | 2018-12-18 | 阿里巴巴集团控股有限公司 | A method of the JAVA program technic based on bytecode is inline |
Non-Patent Citations (7)
Title |
---|
JAASKELAINEN等: "pocl: A Performance-Portable OpenCL Implementation", 《INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING》 * |
KIM,J等: "OpenCL as a unified programming model for heterogeneous CPU/GPU clusters(Conference Paper)", 《ACM SIGPLAN NOTICES》 * |
伍明川等: "面向神威·太湖之光的国产异构众核处理器OpenCL编译系统", 《计算机学报》 * |
何王全等: "面向国产异构众核系统的Parallel C语言设计与实现", 《软件学报》 * |
刘颖等: "异构并行编程模型研究与进展", 《软件学报》 * |
刘颖等: "异构架构下基于放松重用距离的多平台数据布局优化", 《软件学报》 * |
吴承勇等: "异构集群下的MapReduce编程环境", 《科技创新导报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112527262A (en) * | 2019-09-19 | 2021-03-19 | 无锡江南计算技术研究所 | Automatic vector optimization method for non-uniform width of deep learning framework compiler |
CN112527304A (en) * | 2019-09-19 | 2021-03-19 | 无锡江南计算技术研究所 | Self-adaptive node fusion compiling optimization method based on heterogeneous platform |
CN112527262B (en) * | 2019-09-19 | 2022-10-04 | 无锡江南计算技术研究所 | Automatic vector optimization method for non-uniform width of deep learning framework compiler |
CN112527304B (en) * | 2019-09-19 | 2022-10-04 | 无锡江南计算技术研究所 | Self-adaptive node fusion compiling optimization method based on heterogeneous platform |
CN112579088A (en) * | 2019-09-27 | 2021-03-30 | 无锡江南计算技术研究所 | Heterogeneous hybrid programming-oriented one-stop program compiling method |
CN111966397A (en) * | 2020-07-22 | 2020-11-20 | 哈尔滨工业大学 | Automatic transplanting and optimizing method for heterogeneous parallel programs |
CN112083956A (en) * | 2020-09-15 | 2020-12-15 | 哈尔滨工业大学 | Heterogeneous platform-oriented automatic management system for complex pointer data structure |
CN116185426A (en) * | 2023-04-17 | 2023-05-30 | 北京大学 | Compiling optimization method and system based on code fusion and electronic equipment |
CN116185426B (en) * | 2023-04-17 | 2023-09-19 | 北京大学 | Compiling optimization method and system based on code fusion and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109933327B (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933327A (en) | OpenCL compiler method and system based on code fusion compiler framework | |
US8799871B2 (en) | Computation of elementwise expression in parallel | |
Nugteren et al. | Introducing'Bones' a parallelizing source-to-source compiler based on algorithmic skeletons | |
Grass et al. | MUSA: a multi-level simulation approach for next-generation HPC machines | |
JPH08202545A (en) | Object-oriented system and method for generation of target language code | |
Viñas et al. | Exploiting heterogeneous parallelism with the Heterogeneous Programming Library | |
Ziogas et al. | Productivity, portability, performance: Data-centric Python | |
Bratvold | Skeleton-based parallelisation of functional programs | |
Weber et al. | MATOG: array layout auto-tuning for CUDA | |
US8762974B1 (en) | Context-sensitive compiler directives | |
Palyart et al. | MDE4HPC: an approach for using model-driven engineering in high-performance computing | |
CN109901840A (en) | A kind of isomery compiling optimization method that cross-thread redundancy is deleted | |
Castro-Perez et al. | Compiling first-order functions to session-typed parallel code | |
US20170206068A1 (en) | Program optimization based on directives for intermediate code | |
Auler et al. | ACCGen: An automatic ArchC compiler generator | |
US11556357B1 (en) | Systems, media, and methods for identifying loops of or implementing loops for a unit of computation | |
Lyons et al. | Lightweight multilingual software analysis | |
Acosta et al. | Paralldroid: Performance analysis of gpu executions | |
Hornung et al. | A case for improved C++ compiler support to enable performance portability in large physics simulation codes | |
Syschikov et al. | Visual development environment for OpenVX | |
Czejdo et al. | Practical Approach to Introducing Parallelism in Sequential Programs | |
Posadas et al. | Accelerating host-compiled simulation by modifying ir code: Industrial application in the spatial domain | |
Luo et al. | TSCompiler: efficient compilation framework for dynamic-shape models | |
Soest | Compiling Second-Order Accelerate Programs to First-Order TensorFlow Graphs | |
Rodriguez | Frameworks and Compilers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231226 Address after: Room 1305, 13th Floor, No.1 Zhongguancun Street, Haidian District, Beijing, 100086 Patentee after: Zhongke Jiahe (Beijing) Technology Co.,Ltd. Address before: 100080 No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, Haidian District Patentee before: Institute of Computing Technology, Chinese Academy of Sciences |
|
TR01 | Transfer of patent right |