CN112579063A - Acceleration method for exploring optimization space in deep learning compiler - Google Patents

Acceleration method for exploring optimization space in deep learning compiler Download PDF

Info

Publication number
CN112579063A
CN112579063A CN202110223874.3A CN202110223874A CN112579063A CN 112579063 A CN112579063 A CN 112579063A CN 202110223874 A CN202110223874 A CN 202110223874A CN 112579063 A CN112579063 A CN 112579063A
Authority
CN
China
Prior art keywords
operator
optimization
space
graph
operators
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110223874.3A
Other languages
Chinese (zh)
Other versions
CN112579063B (en
Inventor
潘秋红
何水兵
陈刚
杨弢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202110223874.3A priority Critical patent/CN112579063B/en
Publication of CN112579063A publication Critical patent/CN112579063A/en
Application granted granted Critical
Publication of CN112579063B publication Critical patent/CN112579063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/37Compiler construction; Parser generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an acceleration method for exploring an optimized space in a deep learning compiler, aiming at optimizing a neural network effect through a compiling technology and greatly reducing the time consumption of the compiler for exploring an operator optimized space. The method first abstracts the neural network into the form of a computational graph. And secondly, carrying out graph optimization on the calculation graph, and defining an optimization space for each operator in the optimized calculation graph. And then, based on an operator containing optimized spatial information, providing an optimized spatial similarity calculation method. And finally, providing an operator state space exploration method based on similarity, clustering operators based on the similarity, carrying out full space exploration on the core operator in each cluster, exploring the rest operators of the same class in the optimal scheme of the core operator, and determining the optimal scheme of each operator of the whole neural network.

Description

Acceleration method for exploring optimization space in deep learning compiler
Technical Field
The invention relates to the application field of deep learning, compiling technology and high-performance computing cross technology, in particular to an acceleration method for exploring an optimized space in a deep learning compiler.
Background
Today, Deep Neural Networks (DNNs) have found wide application in the fields of image classification, natural language processing, autopilot, augmented reality, and other AI. Particularly, with the rapid development of computing devices, such as GPU, FPGA and specially designed neural network accelerator, the computing power of DNN is becoming more powerful, and the demand for efficient DNN in the field of artificial intelligence is also becoming stronger, so how to improve the operating efficiency of DNN is a very important research problem in recent years.
Now, there are many deep learning frameworks such as TensorFlow, PyTorch, Caffe, MXNet, etc. which can represent neural networks in the form of computational graphs, perform graph-level optimization on the computational graphs, and then map operators in DNN to third-party acceleration libraries such as CuDNN and MKL-DNN to obtain efficient DNN operation effects. However, the optimization at the graph level is generally independent of hardware, and cannot obtain finer-grained optimization effect according to hardware characteristics. Furthermore, the third party acceleration libraries that are relied upon are generally non-open source, which prevents programmers from having effective control and from easily porting DNNs across hardware devices. In addition, for operators not supported by the third-party library, optimization cannot be achieved, or a programmer is required to spend a lot of work to perform manual tuning.
In the research aiming at DNN acceleration, neural networks under various different frameworks are mapped to various hardware platforms through a compiling technology, the neural networks are accelerated in the mapping process, and the method for generating the optimized target platform codes achieves remarkable effects. A relatively efficient neural network compiler comprises the following execution flows: the neural network under various deep learning frameworks is firstly expressed into a computational graph through a high-level intermediate language, and graph-level optimization is carried out on the computational graph. The optimized computation graph is then converted to a low-level intermediate language representation and operator-level optimized. And finally, generating corresponding optimized codes according to the target hardware platform.
When the neural network is optimized at an operator level, the optimization space of each operator is defined in advance, and then the optimization space is explored by adopting a machine learning method to obtain the best optimization scheme. The optimization space of each operator is very large, for example, hundreds of millions of optimization schemes are possible for a Conv operator, so that the exploration of the optimization space of the operators is time-consuming, for example, a Yolo network needs more than one day to explore the optimization schemes.
Disclosure of Invention
In order to solve the defects of the prior art and achieve the purpose of greatly reducing the time consumption of an operator exploration optimization space of a compiler under the sacrifice of increasing the acceptable deep learning network reasoning time, the invention adopts the following technical scheme:
an acceleration method for exploring an optimized space in a deep learning compiler, comprising the steps of:
s1, abstracting the neural network and representing the neural network in the form of a calculation graph;
s2, carrying out graph optimization on the calculation graph;
s3, defining an optimization space for each operator in the optimized calculation graph, and performing optimization space similarity calculation based on the operator containing optimization space information;
s4, searching operator state space based on similarity, clustering operators based on similarity, searching full space of core operators in each cluster, searching other operators of the same class in the optimal scheme of the core operators, determining the optimal scheme of each operator of the whole neural network, and generating target platform codes according to a hardware platform, comprising the following steps:
s41, calculating a similarity matrix of the operators;
s42, the similarity matrix is used as input to execute AP clustering, the AP clustering algorithm does not need to determine the clustering number in advance, the center of each cluster after clustering is an input operator, and an operator does not need to be selected for each cluster again to serve as a core;
s43, for each clustered core operator, searching the complete optimization space of the operator, and storing n optimization schemes with shortest inference time in the searching process;
s44, for each non-core operator of each cluster, only n optimal schemes searched by traversing the core operators depended on by the non-core operators are needed;
and S45, generating a target platform code for each operator according to the optimization scheme, and deploying the operator codes to hardware to run a neural network according to the sequence in the calculation diagram.
Therefore, the time consumption of the compiler for exploring an operator optimization space is greatly reduced under the sacrifice of the increase of the acceptable deep learning network reasoning time.
Further, the neural network computational graph representation method in the step S1 includes the following steps:
s11, mapping the neural network constructed in the deep learning framework to a well-defined high-level intermediate language HIR;
and S12, analyzing the attribute of each operator based on the high-level intermediate language, and constructing a computation graph according to the data dependence relationship among the operators, wherein the constructed computation graph is a directed acyclic graph, each node in the graph represents one operator in the neural network, and edges in the graph represent the data dependence among the operators. A high-level intermediate language (HIR) is realized, the HIR is a specific domain language (DSL) and can represent a neural network computing and control flow, and a neural network in TensorFlow, PyTorch or ONNX format is mapped onto the HIR and represented by the HIR.
Further, the graph optimization method based on the computation graph in step S2 includes the following steps:
s21, performing operator fusion according to the calculation type of the operator, wherein the operator fusion is to combine a plurality of basic operators into a composite operator without storing intermediate results, thereby reducing unnecessary memory read-write time and improving cache locality;
s22, performing data layout optimization on the calculation graph after operator fusion according to hardware characteristics;
and S23, merging parallel operators for the calculation graph after the data layout optimization.
Further, the specific content of step S21 includes: firstly, constructing a domination tree, then traversing nodes in the domination tree, and merging the operators into a new compound operator if the nodes from one node to the domination node meet a predefined merging rule.
Further, the specific content of step S22 includes: in the calculation graph after operator fusion, whether a specified data layout scheme exists when the calculation graph is input is judged, if yes, the specified scheme is directly applied, and if not, the optimal data layout scheme is selected according to hardware characteristics, wherein the data layout scheme comprises row-first storage or column-first storage.
Further, the specific content of step S23 includes: the method has the advantages that a plurality of operators sharing the same input are combined into a larger operator, a larger kernel module is generated for the GPU, the GPU kernel starting expense is reduced, and the GPU utilization rate is improved.
Further, in step S3, the graph-optimized computation graph is mapped onto the LIR, represented by using the LIR, and an optimization space for each operator is defined. A low-level intermediate language LIR is realized, the LIR is a finer-grained intermediate language form, and operator-level optimization and target platform code generation can be performed based on the LIR.
Further, the operator performs tiling of multiple dimensions and multiple cyclic expansion optimization, the dimensions of the operator are expanded, the original length of the dimension is l, the dimension is tiled into m dimensions, k tiling schemes are provided in total, and by analogy, the number of possible selection schemes for each optimization operation is k, and the optimization space of the whole operator is the product of the number of all optimization operation schemes.
Further, in step S3, the method for calculating the optimized spatial similarity includes the following steps:
s31, defining a hash method for each optimization operation according to the optimization space attribute of the operator, and abstracting the optimization operation into a hash value;
s32, vectorizing each pair of operators, sequentially superposing Hash values as vector values according to an optimization operation sequence, and converting the vectors of the two operators into equal length through 0 filling operation; sequentially splicing the vector values of each optimized operation in the space to form a vector value of the whole space;
and S33, performing similarity calculation on the vector values of the pair of operators after splicing.
Furthermore, the similarity calculation is to take a cosine value as the similarity of the pair of operators for the pair of vector values after the concatenation. The classification result using the cosine value as the similarity measure is optimal.
The invention has the advantages and beneficial effects that:
the neural network generated by various deep learning frames is mapped to a uniform intermediate language, codes of various hardware platforms can be generated, the expenditure of programmers caused by model conversion due to different development frames is saved, and the capability of deploying the neural network across hardware equipment is realized; the front end optimizes the neural network at a graph level, the rear end optimizes the neural network at an operator level, the whole optimization process is automatically carried out, efficient optimized codes can be generated for a hardware platform, and a programmer does not need to spend a large amount of time and energy to carry out manual optimization; and when the back end carries out operator optimization, an optimization space exploration scheme based on clustering is executed, so that the time consumption generated by exploring the optimization space can be greatly reduced.
Drawings
FIG. 1 is a flow chart of an acceleration method for exploring an optimization space in a deep learning compiler according to the present invention.
FIG. 2 is a schematic diagram of the Conv-BN-Relu module calculation in the present invention.
FIG. 3 is a schematic diagram of pre-optimization calculations in the present invention.
FIG. 4 is a schematic diagram of calculation after optimization of an operator in the invention.
FIG. 5 is a schematic diagram of the parallel Conv operator optimized calculation in the invention.
FIG. 6 is a schematic diagram of an operator optimization space in the invention.
FIG. 7 is a schematic diagram of the operator optimization space vectorization in the invention.
FIG. 8 is a flow chart of neural network operator level optimization in the invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in FIG. 1, an accelerated method for exploring optimization space in deep learning compiler aims to greatly reduce the time consumption of the compiler for exploring the operator optimization space at the sacrifice of acceptable increase of deep learning network inference time. The method first abstracts the neural network into the form of a computational graph. And secondly, carrying out graph optimization on the calculation graph, and defining an optimization space for each operator in the optimized calculation graph. And then, based on an operator containing optimized spatial information, providing an optimized spatial similarity calculation method. And finally, providing an operator state space exploration method based on similarity, clustering operators based on the similarity, carrying out full space exploration on a core operator in each cluster, exploring the rest operators of the same class in an optimal scheme of the core operator, determining an optimal scheme of each operator of the whole neural network, and generating a target platform code according to a hardware platform.
The method provided by the invention comprises a front end and a back end, wherein the front end takes a model generated by a deep learning framework as input, abstracts the model into a calculation graph expressed by a high-level intermediate language and performs graph optimization. The back end takes the calculation graph after the front end optimization as input, the calculation graph is expressed by a low-level intermediate language, then operator optimization is carried out, the exploration process is accelerated in the optimization process, and finally target platform codes are generated according to a hardware platform.
The specific implementation mode of the invention is as follows:
1) the model generated by the deep learning framework is represented as a computational graph.
1.1) the method realizes a high-level intermediate language HIR, which is a specific field language (DSL) and can represent a neural network computing and control flow.
1.2) mapping the neural network in TensorFlow, PyTorch or ONNX format onto the HIR, and expressing by the HIR.
1.3) constructing a computational graph based on the converted HIR, wherein the computational graph is a directed acyclic graph, each node in the graph represents one operator in the neural network, and edges in the graph represent data dependence among the operators. The computational graph establishes the dependency relationship between control flow and operators and data, and provides an interface for graph-level optimization. Fig. 2 shows a calculation graph generated by a simple Conv-BN-Relu module in a neural network, where each rounded rectangle in the graph represents an operator node, in this example, three nodes are included, each edge in the graph represents a data dependency between operators, for example, the data of the Conv operator dependency is input data and weight W1 data, and the BN operator dependency is a calculation result of a preceding Conv operator.
2) And optimizing the neural network at the graph level based on the computational graph.
2.1) carrying out operator fusion optimization on the calculation graph. The operator fusion is an optimization technology which combines a plurality of basic operators into a composite operator, does not need to store intermediate results, reduces unnecessary memory reading and writing and improves the cache locality. For a given computational graph, a domination tree is constructed first, then nodes in the domination tree are traversed, and if the nodes from one node to its domination node satisfy a predefined fusion rule, the operators are fused into a new compound operator. As shown in fig. 3, it is a calculation graph before operator fusion optimization, and a dominance tree is first calculated for it, for example, the dominance node of node 2 is node 1, and the dominance node of node 3 is node 2. And then traversing the dominating tree, performing rule matching, and fusing to form a new node 1 when the dominating node of the node 2 meets the fusion condition to the dominating node 1 thereof, wherein the dominating node of the node 3 becomes the fused node 1 and meets the new fusion condition with the node 1. Based on this, the Conv-BN-Relu three nodes formed by the nodes 1, 2 and 3 can be fused into a new node, which is denoted as CBR, and the rule is applied to the whole calculation graph, and is fused into the form shown in FIG. 4.
2.2) performing data layout optimization on the calculation graph subjected to the operator fusion optimization. Each operator in the computation graph may be stored in the physical device in a number of ways. For the calculation graph after operator fusion optimization, firstly, judging whether a specified data layout scheme exists when the calculation graph is input, and if the calculation graph exists, directly applying the scheme. When the data layout scheme is not specified, an optimal data layout scheme is selected according to hardware characteristics, such as row-first storage or column-first storage. The most basic is whether the exploration data should be stored in the format of NHWC or NCHW.
2.3) carrying out parallel Conv operator combination on the calculation graph subjected to the data layout optimization. If a plurality of Conv operators sharing the same input exist in the calculation graph, the Conv operators are combined into a larger Conv operator, a larger kernel module is generated for the GPU, the GPU kernel starting expense is reduced, and the GPU utilization rate is improved. For example, for 3 CBR operators of 1x1 in fig. 4, they accept the same input and do the same, so they can be merged to form a larger 1x1 CBR operator, as shown in fig. 5.
3) And calculating the optimized spatial similarity of the neural network operator.
3.1) the method realizes a low-level intermediate language LIR, wherein the LIR is a finer-grained intermediate language form, and operator-level optimization and target platform code generation can be carried out based on the LIR.
3.2) mapping the calculation graph subjected to graph optimization onto the LIR, representing by using the LIR, and defining an optimization space of each operator. As shown in FIG. 6, the Conv operator for the NCHW layout is shown on the left side of the diagram and defines the optimization space as shown on the right side of the diagram. The Conv operator can perform 6-dimensional tiling and two cyclic expansion optimizations, and we take the first optimization operation "tile _ f" as an example, which means that the operator "f" dimension is expanded, the original length of the dimension is 32, and the dimension is tiled into 4 dimensions, so that there are 56 tiling schemes in total. By analogy, the number of possible alternatives for each optimization operation is the value in the rightmost line, and the optimization space of the entire Conv operator is the product of the number of all the alternatives, which is about 1.3 hundred million.
The tiling operation is to divide a certain dimension in the original space into m dimensions, for example, tile the dimension x in the original space to [ x1, x2, x3, x4 ]. In the above example, a dimension f with a length of 32 in the original space is selected and tiled into 4 dimensions, where the number of dimensions after tiling is set to 4, and when it is determined that the original space is to be tiled into 4 dimensions, it is equivalent to divide the length 32 into 4 layers of cycles, and there are several possible divisions of [ (1 × 32), (1 × 2 × 16), (1 × 4 × 8), (1 × 2 × 8), (1 × 2 × 4), (2 × 4) ], and there are [4, 12, 12, 12, 4] tiling schemes, respectively, and there are 56 tiling schemes in total when they are added to each other.
3.3) calculating operator to optimize the spatial similarity. Firstly, defining a Hash method for each optimization operation according to the optimization space attribute of an operator, and abstracting the optimization operation into a Hash value. And then vectorizing each pair of operators, sequentially superposing the hash values as vector values according to the optimization operation sequence, and converting the vectors of the two operators into equal length through 0 filling operation. As shown in fig. 7, an example of calculating a pair of operator optimized space vector values is given, for two space 1 and space 2 with the same optimization operation sequence, a set of optimization operations tile _ rc 1 and tile _ rc 2 in the space are sequentially selected, hash values of the optimization operations are selected for vectorization, after vectorization, the vector value length of tile _ rc 2 is smaller than tile _ rc 1, 0 is filled at the end of tile _ rc 2 vector to expand to be as long as the vector value of tile _ rc 1, and the calculated vector values are vec 1 and vec 2 respectively. By analogy, vector values of each optimization operation in the space are sequentially spliced to form vector values of the whole space. And finally, measuring the cosine value of the pair of vectors as the similarity of the pair of operators, wherein the search space similarity measurement is tried to be carried out by using the similarity of Jaccard and the like, but the classification result is not as good as the cosine value, so the cosine value is finally adopted as a similarity measurement index.
4) And performing operator level optimization on the neural network, and generating target platform codes. The execution flow is shown in fig. 8.
4.1) calculating the similarity matrix of the operators.
4.2) performing AP Clustering (Affinity probability Clustering) by taking the similarity matrix as input, wherein the AP Clustering algorithm does not need to determine the number of clusters in advance, the center of each cluster after Clustering is an input operator, and a core operator does not need to be calculated for each cluster again.
4.3) for each core operator of each cluster, exploring the complete optimization space of the operator, and storing n optimization schemes with the shortest inference time in the exploration process. For the Yolo v3-tiny model, for example, the clustering algorithm can divide the original 20 operators into 8 classes. Then we only need to do complete optimization space exploration to the 8 clustered core operators and save the 10 optimization schemes with the minimum inference time of each operator in the exploration process.
4.4) for the non-core operators in each cluster, n optimal schemes need to be explored by traversing the dependent core operators. For example, for a Yolo v3-tiny model, for 12 operators which are not the cluster center, we only need to search in 10 optimal optimization schemes of the core operator of the class where each operator is located.
And 4.5) generating target platform codes for each operator according to the optimization scheme, and deploying the operator codes to hardware to run a neural network according to the sequence in the computational graph. For codes needing to be deployed on a CPU, a third-party tool LLVM is called to generate corresponding C codes, and for an Nvidia GPU, corresponding CUDA codes are generated and then deployed to the GPU to run.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An acceleration method for exploring an optimized space in a deep learning compiler, comprising the steps of:
s1, abstracting the neural network and representing the neural network in the form of a calculation graph;
s2, carrying out graph optimization on the calculation graph;
s3, defining an optimization space for each operator in the optimized calculation graph, and performing optimization space similarity calculation based on the operator containing optimization space information;
s4, searching operator state space based on similarity, clustering operators based on similarity, searching full space of core operators in each cluster, searching other operators of the same class in the optimal scheme of the core operators, determining the optimal scheme of each operator of the whole neural network, and generating target platform codes according to a hardware platform, comprising the following steps:
s41, calculating a similarity matrix of the operators;
s42, executing AP clustering by taking the similarity matrix as input, wherein the center of each clustered class is an input operator;
s43, for each clustered core operator, searching the complete optimization space of the operator, and storing n optimization schemes with shortest inference time in the searching process;
s44, traversing n optimal schemes explored by the dependent core operators for the non-core operators of each cluster;
and S45, generating a target platform code for each operator according to the optimization scheme, and deploying the operator codes to hardware to run a neural network according to the sequence in the calculation diagram.
2. The acceleration method for exploring the optimized space in the deep learning compiler according to claim 1, wherein the neural network computation graph representing method in step S1 comprises the following steps:
s11, mapping the neural network constructed in the deep learning framework to a well-defined high-level intermediate language HIR;
and S12, analyzing the attribute of each operator based on the high-level intermediate language, and constructing a computation graph according to the data dependence relationship among the operators, wherein the constructed computation graph is a directed acyclic graph, each node in the graph represents one operator in the neural network, and edges in the graph represent the data dependence among the operators.
3. The acceleration method for exploring an optimized space in a deep learning compiler according to claim 1, wherein the graph optimization method based on the computation graph in step S2 includes the following steps:
s21, carrying out operator fusion according to the calculation type of the operator, wherein the operator fusion is to combine a plurality of basic operators into a composite operator without storing intermediate results;
s22, performing data layout optimization on the calculation graph after operator fusion according to hardware characteristics;
and S23, merging parallel operators for the calculation graph after the data layout optimization.
4. The acceleration method for exploring an optimized space in a deep learning compiler according to claim 3, wherein the detailed contents of the step S21 include: firstly, constructing a domination tree, then traversing nodes in the domination tree, and merging the operators into a new compound operator if the nodes from one node to the domination node meet a predefined merging rule.
5. The acceleration method for exploring an optimized space in a deep learning compiler according to claim 3, wherein the detailed contents of the step S22 include: in the calculation graph after operator fusion, whether a specified data layout scheme exists when the calculation graph is input is judged, if yes, the specified scheme is directly applied, and if not, the optimal data layout scheme is selected according to hardware characteristics, wherein the data layout scheme comprises row-first storage or column-first storage.
6. The acceleration method for exploring an optimized space in a deep learning compiler according to claim 3, wherein the detailed contents of the step S23 include: multiple operators sharing the same input are merged into a larger operator.
7. The method of claim 1, wherein in step S3, the graph-optimized computation graph is mapped onto LIR, represented by LIR, and the optimization space of each operator is defined.
8. The acceleration method for exploring an optimized space in a deep learning compiler according to claim 7, wherein the operator performs a plurality of dimension tiling and a plurality of loop expansion optimization, the dimension of the operator is expanded, the dimension is original length l, and is tiled into m dimensions, there are k tiling schemes in total, and so on, the possible number of selection schemes for each optimization operation is k, and the optimized space of the whole operator is the product of the number of all optimization operation schemes.
9. The acceleration method for exploring an optimized space in a deep learning compiler according to claim 7, wherein in said step S3, the method for calculating the similarity of the optimized space comprises the following steps:
s31, defining a hash method for each optimization operation according to the optimization space attribute of the operator, and abstracting the optimization operation into a hash value;
s32, vectorizing each pair of operators, sequentially superposing Hash values as vector values according to an optimization operation sequence, and converting the vectors of the two operators into equal length through 0 filling operation; sequentially splicing the vector values of each optimized operation in the space to form a vector value of the whole space;
and S33, performing similarity calculation on the vector values of the pair of operators after splicing.
10. The method of claim 9, wherein said similarity calculation is performed by taking cosine value as similarity of said pair of operators for said pair of vectors after splicing.
CN202110223874.3A 2021-03-01 2021-03-01 Acceleration method for exploring optimization space in deep learning compiler Active CN112579063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110223874.3A CN112579063B (en) 2021-03-01 2021-03-01 Acceleration method for exploring optimization space in deep learning compiler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110223874.3A CN112579063B (en) 2021-03-01 2021-03-01 Acceleration method for exploring optimization space in deep learning compiler

Publications (2)

Publication Number Publication Date
CN112579063A true CN112579063A (en) 2021-03-30
CN112579063B CN112579063B (en) 2021-06-08

Family

ID=75114093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110223874.3A Active CN112579063B (en) 2021-03-01 2021-03-01 Acceleration method for exploring optimization space in deep learning compiler

Country Status (1)

Country Link
CN (1) CN112579063B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031966A (en) * 2021-05-20 2021-06-25 之江实验室 Deep learning compilation optimization method for intelligently selecting compilation acceleration library
CN113656563A (en) * 2021-07-15 2021-11-16 华为技术有限公司 Neural network searching method and related equipment
CN113703741A (en) * 2021-10-29 2021-11-26 深圳思谋信息科技有限公司 Neural network compiler configuration method and device, computer equipment and storage medium
CN113961267A (en) * 2021-10-15 2022-01-21 杭州海康威视数字技术股份有限公司 Service processing method, device and equipment
CN114115834A (en) * 2022-01-25 2022-03-01 之江实验室 Software and hardware co-compiling processing method and system
CN114428616A (en) * 2022-04-01 2022-05-03 北京清微智能信息技术有限公司 Method for optimizing replacement cost in neural network compiling stage
WO2023284770A1 (en) * 2021-07-13 2023-01-19 清华大学 Tensor program optimization method and apparatus
CN115659281A (en) * 2022-11-16 2023-01-31 之江实验室 Method and device for fusing self-adaptive acceleration operators
CN116301904A (en) * 2023-05-18 2023-06-23 之江实验室 Operator optimization acceleration method and device for deep learning compiler
WO2024021192A1 (en) * 2022-07-25 2024-02-01 之江实验室 Graph optimization method and apparatus for neural network calculation
WO2024051377A1 (en) * 2022-09-07 2024-03-14 华为云计算技术有限公司 Model optimization method and apparatus and computing device
WO2024082551A1 (en) * 2022-10-17 2024-04-25 上海壁仞科技股份有限公司 Operator fusion method, computing apparatus, computing device and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293474A1 (en) * 2015-03-26 2017-10-12 IfWizard Corporation Automatically optimizing analytics database server
CN110490309A (en) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 A kind of Operator Fusion method and its Related product for neural network
CN110766147A (en) * 2018-07-25 2020-02-07 赛灵思公司 Neural network compiler architecture and compiling method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293474A1 (en) * 2015-03-26 2017-10-12 IfWizard Corporation Automatically optimizing analytics database server
CN110766147A (en) * 2018-07-25 2020-02-07 赛灵思公司 Neural network compiler architecture and compiling method
CN110490309A (en) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 A kind of Operator Fusion method and its Related product for neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴林阳: "一种运算和数据协同优化的深度学习编译框架", 《高技术通讯》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031966A (en) * 2021-05-20 2021-06-25 之江实验室 Deep learning compilation optimization method for intelligently selecting compilation acceleration library
WO2023284770A1 (en) * 2021-07-13 2023-01-19 清华大学 Tensor program optimization method and apparatus
CN113656563A (en) * 2021-07-15 2021-11-16 华为技术有限公司 Neural network searching method and related equipment
CN113961267B (en) * 2021-10-15 2023-08-25 杭州海康威视数字技术股份有限公司 Service processing method, device and equipment
CN113961267A (en) * 2021-10-15 2022-01-21 杭州海康威视数字技术股份有限公司 Service processing method, device and equipment
CN113703741A (en) * 2021-10-29 2021-11-26 深圳思谋信息科技有限公司 Neural network compiler configuration method and device, computer equipment and storage medium
CN113703741B (en) * 2021-10-29 2022-02-22 深圳思谋信息科技有限公司 Neural network compiler configuration method and device, computer equipment and storage medium
CN114115834A (en) * 2022-01-25 2022-03-01 之江实验室 Software and hardware co-compiling processing method and system
CN114115834B (en) * 2022-01-25 2022-04-26 之江实验室 Software and hardware co-compiling processing method and system
US11977865B2 (en) 2022-01-25 2024-05-07 Zhejiang Lab Software and hardware collaborative compilation processing system and method
CN114428616A (en) * 2022-04-01 2022-05-03 北京清微智能信息技术有限公司 Method for optimizing replacement cost in neural network compiling stage
WO2024021192A1 (en) * 2022-07-25 2024-02-01 之江实验室 Graph optimization method and apparatus for neural network calculation
WO2024051377A1 (en) * 2022-09-07 2024-03-14 华为云计算技术有限公司 Model optimization method and apparatus and computing device
WO2024082551A1 (en) * 2022-10-17 2024-04-25 上海壁仞科技股份有限公司 Operator fusion method, computing apparatus, computing device and readable storage medium
CN115659281B (en) * 2022-11-16 2023-10-27 之江实验室 Method and device for fusing adaptive acceleration operators
CN115659281A (en) * 2022-11-16 2023-01-31 之江实验室 Method and device for fusing self-adaptive acceleration operators
CN116301904B (en) * 2023-05-18 2023-08-22 之江实验室 Operator optimization acceleration method and device for deep learning compiler
CN116301904A (en) * 2023-05-18 2023-06-23 之江实验室 Operator optimization acceleration method and device for deep learning compiler

Also Published As

Publication number Publication date
CN112579063B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN112579063B (en) Acceleration method for exploring optimization space in deep learning compiler
US11803404B2 (en) Deep learning algorithm compiling method, device, and related product
CN112465108B (en) Neural network compiling method for storage and calculation integrated platform
CN110764744B (en) Intermediate representation generation method and device for neural network calculation
US10901715B1 (en) Lazy compilation and kernel fusion in dynamic computation graphs
Bik et al. Compiler support for sparse tensor computations in MLIR
JP3299611B2 (en) Resource allocation device
CN113031966B (en) Deep learning compilation optimization method for intelligently selecting compilation acceleration library
Le et al. Tflms: Large model support in tensorflow by graph rewriting
US20210049231A1 (en) Multiple Output Fusion For Operations Performed In A Multi-Dimensional Array of Processing Units
WO2021000971A1 (en) Method and device for generating operation data and related product
CN111104120A (en) Neural network compiling method and system and corresponding heterogeneous computing platform
WO2022087788A1 (en) Neural network compiling optimization method and related apparatus
CN116401502B (en) Method and device for optimizing Winograd convolution based on NUMA system characteristics
CN114691148A (en) Model reasoning acceleration method and device, electronic equipment and storage medium
CN116868202A (en) Data processing method, device, equipment and medium
CN115423082A (en) Automatic optimization method for depth model calculation graph related to hardware characteristics
Makrynioti et al. Declarative data analytics: A survey
CN115809063A (en) Storage process compiling method, system, electronic equipment and storage medium
Liu et al. swTVM: exploring the automated compilation for deep learning on sunway architecture
Wen et al. A swap dominated tensor re-generation strategy for training deep learning models
WO2023160290A1 (en) Neural network inference acceleration method, target detection method, device, and storage medium
CN111667060B (en) Deep learning algorithm compiling method and device and related products
CN112800425B (en) Code analysis method and device based on graph calculation
CN115860061A (en) Graph neural network optimization method and graph neural network inference system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant