WO2024082551A1 - Operator fusion method, computing apparatus, computing device and readable storage medium - Google Patents
Operator fusion method, computing apparatus, computing device and readable storage medium Download PDFInfo
- Publication number
- WO2024082551A1 WO2024082551A1 PCT/CN2023/083784 CN2023083784W WO2024082551A1 WO 2024082551 A1 WO2024082551 A1 WO 2024082551A1 CN 2023083784 W CN2023083784 W CN 2023083784W WO 2024082551 A1 WO2024082551 A1 WO 2024082551A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- operators
- template
- fusion
- operator
- neural network
- Prior art date
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 52
- 230000004927 fusion Effects 0.000 claims abstract description 508
- 238000013528 artificial neural network Methods 0.000 claims abstract description 202
- 230000006870 function Effects 0.000 claims description 98
- 238000000034 method Methods 0.000 claims description 46
- 230000015654 memory Effects 0.000 claims description 36
- 230000008569 process Effects 0.000 claims description 32
- 238000011176 pooling Methods 0.000 claims description 31
- 239000011159 matrix material Substances 0.000 claims description 26
- 238000010606 normalization Methods 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000009467 reduction Effects 0.000 claims description 11
- 230000008707 rearrangement Effects 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 6
- 230000001360 synchronised effect Effects 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 description 31
- 238000010586 diagram Methods 0.000 description 28
- 238000007499 fusion processing Methods 0.000 description 25
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 8
- 238000005457 optimization Methods 0.000 description 8
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 7
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 5
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
Definitions
- Embodiments of the present disclosure relate to an operator fusion method, a computing device, a computing apparatus, and a readable storage medium applied to a neural network including multiple operators.
- a fused operator In order to improve the computational efficiency of neural networks, multiple operators that meet certain conditions or rules in the neural network are usually fused before the computation process to form a fused operator.
- the fused operator or a single operator that cannot be fused can be called a fused operator or a fusion layer.
- the computation process of the neural network is performed layer-by-layer in units of fusion layers, so operator fusion becomes an important process in neural network graph optimization.
- the operator fusion process involves two steps: first, defining the fusion mode, that is, defining the target to be fused; second, determining the fusion algorithm, which is used to match the fusion mode that appears in the neural network and fuse without interfering with each other.
- the operator fusion scheme in the related art adopts a fixed fusion mode, and the fusion mode and the fusion algorithm are one-to-one corresponding, that is, one fusion mode corresponds to one fusion algorithm. This means that if the fusion mode changes, the fusion algorithm also needs to be changed to support it, so the expansion and customization of the fusion mode is limited.
- Some embodiments of the present disclosure provide an operator fusion method, a computing device, a computing equipment, and a readable storage medium for providing an efficient and scalable operator fusion solution for various types of neural network structures.
- an operator fusion method which is applied to a neural network including multiple operators.
- the operator fusion method comprises: setting at least one linear A template fusion mode of a structure; and according to the respective category labels of the multiple operators in the neural network, the multiple operators in the neural network are fused according to the template fusion mode of at least one linear structure to fuse one or more operators in the neural network that conform to the template fusion mode of at least one linear structure.
- the operator fusion method also includes: classifying operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assigning category labels to operators of different categories.
- category labels include non-template categories and template categories, wherein the template categories include one or more of the following: matrix categories, normalization categories, pooling categories, data rearrangement categories, data reduction categories, regression function categories, and loss function categories.
- operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
- the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, wherein setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
- the operator fusion method also includes: generating a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators, wherein, fusing multiple operators in the neural network according to at least one linear structure template fusion mode includes: according to the respective category labels of the multiple operators in the neural network, according to one or more template categories linearly connected in the template fusion mode of at least one linear structure, traversing the operators in the directed acyclic graph of the neural network, matching one or more operators to the corresponding linear structure template fusion mode, respectively, and fusing the multiple operators in the neural network.
- setting a template fusion mode of at least one linear structure based on the category label of an operator includes: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, firstly perform operator fusion on multiple operators in the neural network according to the first template fusion mode, and then perform operator fusion on multiple operators in the neural network according to the second template fusion mode.
- At least one linear structure is set based on the category label of the operator.
- the template fusion mode includes: using a configurable file to set at least one linear structure template fusion mode.
- the operator fusion method also includes: setting a fusion mode of a subgraph structure based on operators; and generating a directed acyclic graph based on the network structure of a neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
- the method before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, the method also includes: performing operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after operator fusion of the subgraph structure.
- performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure includes: traversing the operators in a graph expression after fusion of sub-graph structure operators according to the respective category labels of the multiple operators in the neural network and according to the template fusion mode of at least one linear structure, respectively matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.
- operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, addition function; operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator; operators belonging to the normalization category include: batch normalization operator, layer normalization operator; operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator; operators belonging to the data rearrangement category include: splicing operator, transformation operator; operators belonging to the data reduction category include: maximum function, minimum function, average function; operators belonging to the regression function category include: regression function for sample points, regression function for channels; and operators belonging to the loss function category include: mean square error function, cross entropy function.
- a computing device for performing operator fusion on a neural network, wherein the neural network includes multiple operators.
- the computing device includes: a fusion mode configuration unit configured to: set at least one linear structure template fusion mode based on the category label of the operator; and a fusion unit configured to perform operator fusion on the multiple operators in the neural network according to the template fusion mode of at least one linear structure according to the category labels of the multiple operators in the neural network, so as to fuse one or more operators in the neural network that conform to the template fusion mode of at least one linear structure.
- the computing device further includes a classification unit configured to: The operators are classified according to the functions of the sub-operators and/or the computing architecture characteristics of the hardware platform, and category labels are assigned to different categories of operators, wherein the category labels include non-template categories and template categories, and the template categories include one or more of the following: matrix categories, normalization categories, pooling categories, data rearrangement categories, data reduction categories, regression function categories, and loss function categories.
- operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
- the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, and setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
- the computing device also includes a generating unit configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between the operators.
- a fusion unit performs operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, including: traversing the operators in a directed acyclic graph of the neural network according to the template fusion mode of at least one linear structure according to the respective category labels of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the multiple operators in the neural network.
- a fusion mode configuration unit sets a template fusion mode of at least one linear structure based on the category label of the operator, including: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, the fusion unit first performs operator fusion on multiple operators in the neural network according to the first template fusion mode, and then performs operator fusion on multiple operators in the neural network according to the second template fusion mode.
- the fusion mode configuration unit sets the template fusion mode of at least one linear structure based on the category label of the operator, including: setting the template fusion mode of at least one linear structure using a configurable file.
- the fusion mode configuration unit is further configured to: set the fusion mode of the subgraph structure in units of operators.
- the computing device further includes a generating unit configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph The graph includes operators and the connections between operators. The connections between operators represent the data dependencies and data flows between operators.
- the fusion unit before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, is further configured to: perform operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure, wherein the fusion unit performs operator fusion on multiple operators in the neural network according to a template fusion mode of at least one linear structure, including: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.
- the operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, addition function;
- the operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator;
- the operators belonging to the normalization category include: batch normalization operator, layer normalization operator;
- the operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator;
- the operators belonging to the data rearrangement category include: splicing operator, transformation operator;
- the operators belonging to the data reduction category include: maximum function, minimum function, average function;
- the operators belonging to the regression function category include: regression function for sample points, regression function for channels; and the operators belonging to the loss function category include: mean square error function, cross entropy function.
- a computing device including: a processor; and a memory, wherein the memory stores a computer-readable code, and when the computer-readable code is executed by the processor, the operator fusion method as described above is executed.
- a non-transitory computer-readable storage medium on which instructions are stored.
- the instructions are executed by a processor, the operator fusion method described above is implemented.
- the operator fusion method, computing device, computing equipment and storage medium provided by some embodiments of the present disclosure, it is possible to set a template fusion mode of a linear structure based on the category label of the operator, and provide an operator fusion algorithm that is generally applicable to the template fusion mode of the linear structure, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process for the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform.
- FIG1A shows a schematic diagram of a neural network calculation process
- FIG1B shows a schematic diagram of neural network operator fusion in the related art
- FIG2 shows a schematic flow chart of an operator fusion method according to some embodiments of the present disclosure
- FIG3 shows a schematic diagram of operator classification according to some embodiments of the present disclosure
- FIG4 shows a schematic diagram of a template fusion mode according to some embodiments of the present disclosure
- FIG5A shows a schematic diagram of a directed acyclic graph of a neural network
- FIG5B is a schematic diagram showing an operation fusion of the directed acyclic graph shown in FIG5A ;
- FIG5C shows the directed acyclic graph after fusion
- FIG6 shows a schematic diagram of fusion priority of a template fusion mode according to some embodiments of the present disclosure
- FIG7 shows a schematic block diagram of an operator fusion solution according to some embodiments of the present disclosure
- FIGS. 8A-8B show a schematic diagram of an operator fusion solution process according to some embodiments of the present disclosure
- FIG9 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure.
- FIG10 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure
- FIG11 is a schematic diagram showing the architecture of an exemplary computing device according to some embodiments of the present disclosure.
- FIG. 12 shows a schematic diagram of a computer-readable storage medium according to some embodiments of the present disclosure.
- ANN Artificial Neural Networks
- neural network is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This network relies on the complexity of the system to adjust the interconnected relationships between a large number of internal nodes to achieve the purpose of processing information.
- Regardless of the type of neural network their common characteristics are large-scale parallel processing, distributed storage, elastic topology, high redundancy and nonlinear operations, and have capabilities in terms of computing speed, associative ability, adaptability, fault tolerance and self-organization. These characteristics and capabilities constitute the technical basis for neural networks to simulate intelligent activities and have been used in various technical fields. For example, neural networks can be used in data compression, image processing, video coding, signal processing and other application fields.
- FIG1A shows a schematic diagram of the neural network computational process.
- a computational unit used for a neural network it is generally refined into a matrix multiplication unit, a vector operation unit, and a scalar operation unit, which are used to perform different computational tasks, respectively.
- a data synchronization method through a shared memory to ensure the execution order between fusion layers with data dependencies.
- the shared memory can be based on an internal memory or an external memory, which is not limited here.
- the efficient and flexible implementation of the operator fusion process performed before the calculation process shown in FIG. 1A above has an important impact on the computational efficiency of the neural network.
- it is necessary to quickly adjust and define the fusion mode and adapt the corresponding fusion algorithm to realize the operator fusion in the neural network.
- FIG1B shows a schematic diagram of neural network operator fusion in the related art, in which four fixed types of fusion modes 1-fusion mode 4 are schematically shown. Based on the four fusion modes defined in FIG1B, it is necessary to adapt the corresponding four fusion algorithms 1-fusion algorithm 4 to match and find operators that meet the above fusion modes 1-fusion modes 4 in the network structure of the neural network and fuse them, so as to obtain a fused network structure diagram, so as to perform subsequent calculation processes based on the fused network structure diagram.
- a linear fusion target (including operator 1, operator 2 and operator 3) is defined.
- a fusion algorithm 1 In the process of performing operator fusion based on the fusion mode 1, it is necessary to design a fusion algorithm 1 to match the structure of operator 1-operator 2-operator 3 in the above fusion mode 1 among the many operators included in the neural network, and fuse them.
- a subgraph matching method is usually used to implement the fusion algorithm, that is, a subgraph corresponding to the fusion mode is designed, and the operator structure diagram of the neural network is traversed to find the corresponding subgraph structure. It can be understood that the fusion modes shown in FIG1B are all performed in units of operators.
- the defined fusion mode is static relative to the fusion algorithm, and the fusion algorithm and the fusion mode itself are directly related.
- a fusion mode corresponds to a fusion algorithm. If the fusion mode changes, the fusion algorithm also needs to be changed to support it. Therefore, the expansion and customization of the fusion mode are limited, and it is not universally applicable, and the fusion algorithm also needs to be adjusted accordingly.
- an efficient fusion mode also needs to be determined by considering the architectural characteristics of the hardware platform running the neural network. Once the hardware platform is switched, it means that the fusion mode will also change accordingly, and the change in the fusion mode also requires the need to re-formulate and compile the corresponding fusion algorithm.
- the present invention is applied to the field of efficient computing of neural network reasoning or training, solves the operator fusion problem in the graph optimization process of neural network, takes the universality and scalability of fusion mode as the goal, and provides an operator fusion strategy with variable fusion mode and fixed fusion algorithm.
- some embodiments of the present disclosure provide an operator fusion method for performing operator fusion on various types of neural network structures, and design a linear operator fusion method based on the category label setting of the operator.
- the template fusion mode of the structure and the corresponding operator fusion algorithm are designed so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform.
- FIG2 shows a flow chart of an operator fusion method according to some embodiments of the present disclosure.
- the operator fusion method includes steps S101 and S102.
- step S101 at least one linear structure template fusion mode is set based on the category label of the operator.
- a template-type fusion mode is provided, and it is defined for the category label of the operator.
- it has higher adaptability. As long as the operator conforms to the category, it can be fused, which can significantly reduce the number of fusion modes that need to be fused.
- step S102 according to the category labels of the multiple operators in the neural network, the multiple operators in the neural network are fused according to the template fusion mode of at least one linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the above-mentioned at least one linear structure.
- the template fusion mode in step S101 will be described first, and then the corresponding fusion algorithm, that is, the implementation process of step S102 will be described.
- the operator fusion method also includes: classifying operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assigning category labels to operators of different categories.
- FIG3 shows a schematic diagram of operator classification according to some embodiments of the present disclosure. It can be understood that FIG3 only gives some examples of operator classification, and the operators of the neural network may also include operators not shown in FIG3 and other categories, which are not limited here. In other disclosed embodiments, other classification methods may be defined for operators in a neural network.
- the above category labels may include non-template categories and template categories, wherein the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
- the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
- the operators of this non-template category are set separately so as not to include operators of this category in template matching.
- the operators belonging to the non-template category may be, for example, Element Wise operators, which may include: activation functions (Sigmoid, Swish), linear rectification functions (Relu), absolute value functions (Abs), addition functions (Add), etc., which are not listed one by one here.
- Element Wise operators which may include: activation functions (Sigmoid, Swish), linear rectification functions (Relu), absolute value functions (Abs), addition functions (Add), etc.
- the above operators belonging to the non-template category are ubiquitous in the structure of the neural network.
- the operators belonging to the non-template category may also be other operators besides Element Wise operators, which are not given as examples here.
- the template fusion mode of the above linear structure is composed of one or more template categories connected linearly, that is, the template fusion mode according to the embodiment of the present disclosure only includes operators of the template category and does not include the above operators belonging to the non-template category. Therefore, in the template fusion mode designed according to the embodiment of the present disclosure, the influence of such operators that do not occupy additional computing resources on the fusion mode is eliminated, which is conducive to further reducing the complexity of the template fusion mode and has more universal applicability.
- the template category may include one or more of the following: matrix category (Matrix), normalization category (Normalize), pooling category (Pooling), data rearrangement category (Reorder), data reduction category (Reduce), regression function category (Softmax) and loss function category (Loss), etc., which are not listed one by one here.
- matrix category Motrix
- Normalize normalization category
- Pooling category Pooling category
- Reorder data rearrangement category
- Reduce data reduction category
- Softmax regression function category
- Losss loss function category
- operators belonging to the matrix category may include: forward convolution operator (Forward conv), backward data convolution operator (Backward data conv), backward filter convolution operator (Backward filter conv), matrix multiplication operator (Matrix Multiplication, MatMul), etc.
- Operators belonging to the normalization category may include: batch normalization operator (Batch norm), layer normalization operator (Layer norm), etc.
- Operators belonging to the pooling category may include: maximum pooling layer operator (Max pooling), average pooling layer operator (Average pooling), global average pooling layer operator (Global average pooling), etc.
- Operators belonging to the data rearrangement category may include: concatenation operator (Concate), transformation operator (Pemute), deformation operator (Reshape), cutting operator (Slice), etc.
- Operators belonging to the data reduction category (Reduce) may include: maximum function (Max), minimum function (Min), Average function (Average) and sum function (Sum), etc.
- Operators belonging to the regression function category (Softmax) may include: regression function for sample points (Softmax on sample), regression function for channel (Softmax on channel), etc.
- Operators belonging to the loss function category (Loss) may include: mean square error function (MSE), cross entropy function (Cross-entropy), etc.
- the operators of the neural network are first classified, and then the template fusion mode is defined in units of categories, rather than fusing the operators themselves, which is conducive to reducing the number of fusion modes required for the neural network (this will be reflected in the description below).
- the template fusion mode defined above in units of categories the influence of operators of non-template categories that do not occupy additional computing resources on the fusion mode is also eliminated, that is, such operators are not included in the defined template fusion mode, which is conducive to further reducing the complexity of the template fusion mode and is more generally applicable.
- FIG4 shows a schematic diagram of a template fusion mode according to some embodiments of the present disclosure.
- four groups of example operator graphs are schematically shown on the left.
- these four groups of example operator graphs can be characterized as the template fusion mode shown on the right side of FIG4, that is, a matrix category (Matrix) operator is connected to a normalization category (Normalize) operator. That is, when the template fusion mode shown on the right side is defined, the four example operator graphs shown on the left side appearing in the neural network all conform to the template fusion mode, and thus perform operator fusion.
- Matrix matrix category
- Normalize normalization category
- the template fusion mode can also summarize other forms of operator connection methods other than those shown on the left side of FIG4, that is, in the operator fusion process, as long as the operator that satisfies the template fusion mode will be fused to form a fused operator, that is, a fusion layer.
- This fully reflects the advantages of the template fusion mode provided by the embodiment of the present disclosure in terms of universal applicability, that is, it has the attributes of a template, thereby reducing the number of fusion modes required for the neural network.
- four fusion modes and corresponding fusion algorithms need to be defined respectively.
- the operator fusion method may further include: generating a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
- a neural network can be viewed as a directed acyclic graph (DAG) consisting of many operators (also called computing nodes), and each node in the DAG corresponds to an operator in the neural network. It is understandable that a neural network can be constructed in any known manner.
- the network structure is abstracted as a directed acyclic graph DAG, which is not described in detail here.
- FIG. 5A shows a schematic diagram of a directed acyclic graph of a neural network. As shown in FIG. 5A , a directed acyclic graph formed by a neural network including 13 operators is shown. In addition, the operators are connected by lines, which can be used to characterize the data dependencies and data flows between the operators.
- the output data of operator 1 flows to operator 2, and the output data of operator 2 flows to operator 3 and operator 6, and so on. It can be determined that operator 1 has a data dependency relationship with operator 2, and operator 2 has a data dependency relationship with operator 6 and operator 3. It can be understood that the network structure shown in FIG. 5A is only schematic, and the operator fusion method according to the embodiment of the present disclosure can be applied to various types of neural network structures.
- an operator fusion process can be performed based on a directed acyclic graph of a neural network.
- the template fusion mode of the at least one linear structure is composed of one or more template categories that are linearly connected, wherein setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
- performing operator fusion on multiple operators in a neural network according to a template fusion mode of a linear structure in step S102 includes: traversing the operators in a directed acyclic graph of the neural network according to at least one template fusion mode of a linear structure according to respective category labels of the multiple operators in the neural network, matching one or more operators to corresponding template fusion modes of a linear structure, and performing operator fusion on multiple operators in the neural network.
- FIG. 5B shows a schematic diagram of performing operator fusion on the directed acyclic graph shown in FIG. 5A according to the template fusion mode shown in FIG. 4 .
- the defined template fusion mode is performed in units of operator categories, in the process of operator fusion, it is also necessary to perform fusion according to the category labels of the respective operators in the neural network.
- the information of operators 1-13 shown in FIG. 5A is as shown in the following Table 1:
- a template fusion mode of a linear structure and a corresponding operator fusion algorithm can be defined based on the category label of the operator, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of a neural network.
- This operator fusion process can be expressed as an operator fusion algorithm corresponding to the template fusion mode.
- the template fusion mode for setting a linear structure based on the category label of the operator includes: a first template fusion mode and a second template fusion mode for setting a linear structure based on the category label of the operator.
- a first template fusion mode and a second template fusion mode for setting a linear structure based on the category label of the operator.
- not only one template fusion mode but also multiple template fusion modes need to be defined for the neural network, wherein, when the first template fusion mode and the second template fusion mode have the same part of the template category and the first template fusion mode includes other template categories in addition to the same part of the template category, the multiple operators in the neural network are first fused according to the first template fusion mode, and then the multiple operators in the neural network are fused according to the second template fusion mode.
- FIG6 shows a schematic diagram of fusion priority of a template fusion mode according to some embodiments of the present disclosure, Among them, template fusion mode 1, template fusion mode 2 and template fusion mode 3 are schematically shown.
- first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories can also be expressed as the first template fusion mode containing the second template fusion mode.
- Figure 6 which shows the inclusion relationship between template fusion modes, specifically, it can be expressed as template fusion mode 1 including template fusion mode 2 and template fusion mode 3.
- template fusion mode 1 and template mode 2 have the same part of template categories (expressed as “Matrix-Normalize”) and template fusion mode 1 also includes other template categories (expressed as "Softmax”) in addition to the template category of the same part. Specifically, the template category of the same part corresponds to template fusion mode 2, and template fusion mode 2 does not include other template categories.
- template fusion mode 2 can also be represented as including template fusion mode 3. That is, template fusion mode 2 and template mode 3 have the same part of template categories (expressed as "Matrix") and template fusion mode 1 also includes other template categories (expressed as "Normalize”). Specifically, the same part of template categories corresponds to template fusion mode 3, and template fusion mode 3 does not include other template categories.
- a fusion priority is set for the template fusion modes with an inclusion relationship. Specifically, template fusion mode 1 is matched and fused before template fusion mode 2 and template fusion mode 3, and template fusion mode 2 is fused before template fusion mode 3. In other words, the more template categories included, the higher the priority.
- Such a priority setting can make as many operators as possible merge into one fusion layer. Otherwise, without setting the above priority, when performing fusion matching based on a directed acyclic graph of a neural network, the operators that meet template fusion mode 3 may be directly fused, and the fusion matching that meets template fusion mode 1 cannot be guaranteed.
- setting a template fusion mode of a linear structure based on a category label of an operator may include: setting a template fusion mode of a linear structure using a configurable file.
- the format of the configurable file includes a json format or a yaml format. That is to say, in the operator fusion method according to an embodiment of the present disclosure, the fusion mode can be defined based on a configurable file, for example, a plurality of template fusion modes as shown in FIG6 are defined, and this process can be implemented dynamically and configurably to adapt to different neural network structures and adjustments to the hardware computing platform, thereby increasing the flexibility of the operator fusion process.
- the configurable file format is not limited to the two types listed above.
- the template fusion mode is configurable, for example, configured by means of a file, and a plurality of template fusion modes can form a set, which is input into a computing device by means of a file configuration, for use in the operator fusion process of the neural network.
- the network structure of the neural network may be more complex. Therefore, in addition to the template fusion mode of the linear structure, other types of more complex fusion modes may also need to be defined, such as the fusion mode of the subgraph structure. This fusion mode of the seed graph structure requires operator fusion across multiple parallel branches to achieve higher computational efficiency.
- the operator fusion method provided according to some embodiments of the present disclosure may also include: setting a fusion mode of a subgraph structure in units of operators.
- the fusion mode of a subgraph structure is defined in units of operators, for example, the mode indicates that several operators with a specific connection relationship need to be fused.
- each fusion mode of the subgraph structure can be regarded as a subgraph and described using the same graph intermediate representation (Graph IR) as the neural network.
- the operator fusion process based on the fusion mode of the subgraph structure is the subgraph matching process, that is, matching the subgraph that conforms to the Graph IR form from the graph representation of the neural network (usually a DAG graph) and fusing the operators that meet the matching conditions.
- the operator fusion method may further include: performing operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure.
- performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure includes: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on multiple operators in the neural network. In which, operators belonging to non-template categories in the neural network are ignored during the operator fusion process.
- Operator fusion based on subgraph matching and operator fusion based on linear template fusion mode can be two independent fusion processes, and at the same time, they complement each other in function.
- the former is responsible for the calculation mode matching of complex structures
- the latter is responsible for the calculation mode matching of linear structures.
- the operator fusion based on subgraph matching can be executed first, and then the operator fusion process based on the linear template fusion mode can be executed.
- the execution of the latter can be based on the fusion result of the former, and further fusion can be performed on this basis.
- FIG. 7 shows a schematic block diagram of an operator fusion solution according to some embodiments of the present disclosure
- FIG. 8A- 8B shows a schematic flow chart of an operator fusion scheme according to some embodiments of the present disclosure.
- the operator fusion scheme according to some embodiments of the present disclosure will be described as a whole in conjunction with FIG. 7 , FIG. 8A and FIG. 8B .
- the operator fusion scheme can be divided into two parts: defining the fusion mode and executing the fusion algorithm based on the defined fusion mode, so as to obtain the neural network graph expression after operator fusion.
- a configurable file such as json format or yaml format
- json format or yaml format can be used to set the template fusion mode of the linear structure and the fusion mode of the subgraph structure.
- the present disclosure does not limit the number of fusion modes and the specific mode form.
- the subgraph-based operator fusion process is first performed according to the fusion mode of the defined subgraph structure, for example, through strict subgraph matching. After this step, the graph expression after the subgraph structure operator fusion is obtained. Then, according to the category labels of the respective operators, according to one or more template categories linearly connected in the template fusion mode of the linear structure, the operators in the graph expression after the subgraph structure operator fusion are fused, and in the operator fusion process, the operators belonging to the non-template category in the neural network are ignored to obtain the graph expression after the linear operator fusion.
- the fusion mode of the subgraph structure shown including operator 6, operator 7, operator 8, and operator 9
- strict subgraph matching is performed in the DAG to find operators that meet the mode, and the operators are fused to obtain a graph expression after the subgraph structure operator fusion, wherein the fused layer after fusion is represented by an ellipse P, and is shown as fusion layer P in FIG8B .
- the template category corresponding to this fusion layer P a direct definition method can be adopted.
- a unique category label such as category X, can be defined to distinguish it from operators of categories such as Matrix, Normalize, etc.
- the operator fusion process based on the linear structure template fusion mode will continue.
- the operators that meet the template are fused and represented as fusion layer A, fusion layer B, fusion layer C, fusion layer D and fusion layer E respectively, and finally the graph expression after linear operator fusion is obtained.
- the operator fusion method of some embodiments of the present disclosure it is possible to perform operator fusion for various types of neural network structures, and provide a template fusion mode of a linear structure set based on the category label of the operator, and a corresponding operator fusion algorithm, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the performance of the neural network on the hardware platform. Computational efficiency.
- a computing device for performing operator fusion on a neural network, wherein the neural network includes a plurality of operators.
- the computing device according to the embodiment of the present disclosure can be applied to the field of efficient computing for neural network reasoning or training, solve the operator fusion problem in the graph optimization process of the neural network, and provide an operator fusion strategy with a variable fusion mode and a fixed fusion algorithm with the universality and scalability of the fusion mode as the goal.
- the computing device can perform operator fusion on various types of neural network structures, and designs a template fusion mode of a linear structure set based on the category label of the operator, as well as a corresponding operator fusion algorithm, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform.
- Fig. 9 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure.
- a computing device 1000 according to an embodiment of the present disclosure includes: a fusion mode configuration unit 1010 and a fusion unit 1020 .
- the fusion mode configuration unit 1010 can be configured to set at least one linear structure template fusion mode based on the category label of the operator. It can be understood that the term "at least one" in the present disclosure can be understood as one or more than one, that is, one, two or more, and is not limited here. The number of template fusion modes is no longer limited below, and can be similarly interpreted as one or more than one template fusion modes.
- a template-type fusion mode is defined, and is defined for the category label of the operator. Compared with the fusion mode based on operators as shown in FIG1B , it has higher adaptability. As long as the operators conform to the category, they can be fused, which can significantly reduce the number of fusion modes that need to be designed.
- the fusion unit 1020 can be configured to perform operator fusion on multiple operators in the neural network according to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the linear structure.
- the computing device further includes a classification unit 1030, which is configured to classify operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assign category labels to operators of different categories.
- a classification unit 1030 which is configured to classify operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assign category labels to operators of different categories.
- the term "and/or" in the present disclosure represents three situations, namely, based on the functions of the operators, based on the computing architecture characteristics of the hardware platform, and based on the functions of the operators and the computing architecture characteristics of the hardware platform.
- the category labels include non-modal Plate categories and template categories, the template category includes one or more of the following: matrix category, normalization category, pooling category, data rearrangement category, data reduction category, regression function category and loss function category.
- the operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, and addition function.
- the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
- the operators belonging to the non-template category during the calculation process, a sample point is still a sample point after calculation, and no additional resources are occupied. Therefore, in some embodiments of the present disclosure, the operators of this category are set separately so as not to include the operators of this category in the template matching.
- the operators belonging to the non-template category may be, for example, Element Wise operators.
- the Element Wise operator may include activation functions (Sigmoid, Swish), linear rectification functions (Relu), absolute value functions (Abs), addition functions (Add), etc., which are not listed one by one here.
- the above operators belonging to the non-template category are ubiquitous in the structure of the neural network.
- the template fusion mode of the above linear structure is composed of one or more template categories connected in a straight line, that is, the template fusion mode according to the embodiment of the present disclosure only includes operators of the template category but does not include the above operators belonging to the non-template category. Therefore, in the template fusion mode designed according to the embodiment of the present disclosure, the influence of such operators that do not occupy additional computing resources on the fusion mode is eliminated, which is conducive to further reducing the complexity of the template fusion mode and making it more universally applicable.
- operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator; operators belonging to the normalization category include: batch normalization operator, layer normalization operator; operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator; operators belonging to the data rearrangement category include: splicing operator, transformation operator; operators belonging to the data reduction category include: maximum value function, minimum value function, average value function; operators belonging to the regression function category include: for sample points The regression function for the channel; and the operators belonging to the loss function category include: mean square error function, cross entropy function.
- the operators of the neural network are first classified, and then the template fusion mode is defined in units of categories, rather than fusion by the operators themselves, which is conducive to reducing the number of fusion modes required for the neural network (this will be reflected below).
- the template fusion mode defined in units of categories above the influence of operators of non-template categories that do not occupy additional computing resources on the fusion mode is also eliminated, that is, such operators are not included in the defined template fusion mode, which is conducive to further reducing the complexity of the template fusion mode and is more generally applicable.
- the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, and setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
- FIG4 shows a schematic diagram of the template fusion mode according to some embodiments of the present disclosure, and the specific structure can refer to the description made in conjunction with FIG4 above.
- the computing device further includes a generating unit 1040, which is configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and lines between operators, and the lines between operators represent data dependencies and data flows between operators.
- a directed acyclic graph can be referred to FIG5A , which will not be repeated here.
- the fusion unit 1020 performs operator fusion on multiple operators in the neural network according to the template fusion mode of at least one linear structure, including: according to the respective category labels of the multiple operators in the neural network, according to the template fusion mode of at least one linear structure, traversing the operators in the directed acyclic graph of the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, respectively, and performing operator fusion on the multiple operators in the neural network.
- the fusion mode configuration unit 1010 sets at least one template fusion mode of a linear structure based on the category label of the operator, including: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, the fusion unit 1020 first performs operator fusion on multiple operators in the neural network according to the first template fusion mode, and then performs operator fusion on multiple operators in the neural network according to the second template fusion mode.
- a template fusion mode of a linear structure and a corresponding operator fusion algorithm are defined based on the category label of the operator, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of a neural network.
- This operator fusion process can be expressed as an operator fusion algorithm corresponding to the template fusion mode.
- the template fusion mode for setting a linear structure based on the category label of the operator includes: a first template fusion mode and a second template fusion mode for setting a linear structure based on the category label of the operator.
- a first template fusion mode and a second template fusion mode for setting a linear structure based on the category label of the operator.
- not only one template fusion mode but also multiple template fusion modes need to be defined for the neural network, wherein, when the first template fusion mode and the second template fusion mode have the same part of the template category and the first template fusion mode includes other template categories in addition to the same part of the template category, the multiple operators in the neural network are first fused according to the first template fusion mode, and then the multiple operators in the neural network are fused according to the second template fusion mode.
- the fusion mode configuration unit 1010 sets the template fusion mode of the linear structure based on the category label of the operator, including: setting the template fusion mode of the linear structure by using a configurable file.
- the fusion mode configuration unit 1010 is further configured to: set the fusion mode of the subgraph structure in units of operators.
- the computing device further includes a generation unit 1040, configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
- the fusion unit 1020 before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, is further configured to: perform operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure, wherein the fusion unit 1020 performs operator fusion on multiple operators in the neural network according to the template fusion mode of the linear structure, including: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.
- a computing device is also provided.
- Fig. 10 shows a schematic block diagram of a computing device according to an embodiment of the present disclosure.
- the computing device 2000 may include a processor 2010 and a memory 2020.
- the memory 2020 stores a computer-readable code, and when the computer-readable code is executed by the processor 2010, the operator fusion method described above may be executed.
- the processor 2010 can perform various actions and processes according to the program stored in the memory 2020.
- the processor 2010 can be an integrated circuit with signal processing capabilities.
- a general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc.
- the processor here can refer to a computing device capable of performing neural network calculations.
- the memory 2020 stores computer executable instruction codes, which are used to implement the operator fusion method according to the embodiment of the present disclosure when executed by the processor 2010.
- the memory 2020 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. It should be noted that the memory described in the present disclosure may be any suitable type of memory.
- a processor such as a CPU can implement an operator fusion method for synchronization between neural network layers.
- the computing device 3000 may include a bus 3010, one or more CPUs 3020, a read-only memory (ROM) 3030, a random access memory (RAM) 3040, a communication port 3050 connected to a network, an input/output component 3060, a hard disk 3070, etc.
- the storage device in the computing device 3000 such as the ROM 3030 or the hard disk 3070, can store various data or files used for processing and/or communication of the operator fusion method provided by the present disclosure and program instructions executed by the CPU.
- the computing device 3000 may also include a user interface 3080.
- the architecture shown in FIG11 is only exemplary. When implementing different devices, one or more components in the computing device shown in FIG11 may be omitted according to actual needs.
- a non-transitory computer-readable storage medium is also provided.
- Fig. 12 shows a schematic diagram 4000 of a storage medium according to the present disclosure.
- a computer storage medium 4020 stores computer readable instructions 4010.
- the computer readable storage medium includes, but is not limited to, volatile memory and/or non-volatile memory. Volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc.
- the computer storage medium 4020 may be connected to a computing device such as a computer, and then, when the computing device runs the computer-readable instructions 4010 stored on the computer storage medium 4020, the operator fusion method provided according to the embodiment of the present disclosure as described above may be performed.
- some embodiments of the present disclosure provide an operator fusion method, a computing device, a computing equipment and a storage medium, which are used to provide operator fusion solutions for various types of neural network structures, especially neural networks with more complex network structures. More specifically, for performing operator fusion on various types of neural network structures, a template fusion mode of a linear structure set based on operator category labels and a corresponding operator fusion algorithm are designed to make the designed template fusion mode universal and scalable and able to be fused in units of operator categories, thereby providing an efficient operator fusion process for the neural network, which is beneficial to the graph optimization of the neural network and improving the computing efficiency of the neural network on the hardware platform.
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Embodiments of the disclosure provide an operator fusion method, a computing apparatus, a computing device and a readable storage medium. The operator fusion method is applied to a neural network comprising a plurality of operators, and comprises: on the basis of a category label of an operator, setting a template fusion mode of at least one linear structure; according to respective category labels of a plurality of operators in a neural network, performing operator fusion on the plurality of operators in the neural network according to the template fusion mode of the at least one linear structure, so as to fuse one or more operators in the neural network conforming to the template fusion mode of the at least one linear structure.
Description
本申请要求于2022年10月17日递交的中国专利申请第202211268055.1号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。This application claims priority to Chinese Patent Application No. 202211268055.1 filed on October 17, 2022, and the contents of the above-mentioned Chinese patent application disclosure are hereby cited in their entirety as a part of this application.
本公开的实施例涉及一种应用于包括多个算子的神经网络的算子融合方法、计算装置、计算设备以及可读存储介质。Embodiments of the present disclosure relate to an operator fusion method, a computing device, a computing apparatus, and a readable storage medium applied to a neural network including multiple operators.
为了提高神经网络的计算效率,在进行计算过程之前,通常将神经网络中满足一定条件或规则的多个算子(operator)进行融合,形成经融合的算子(fused operator)。经融合得到的算子或者无法进行融合的单个算子,可以称为一个融合算子,也可以称为融合层(fusion layer)。神经网络的计算过程以融合层为单位、逐层(layer-by-layer)进行,由此,算子融合(operator fusion)成为神经网络图优化中的重要过程。In order to improve the computational efficiency of neural networks, multiple operators that meet certain conditions or rules in the neural network are usually fused before the computation process to form a fused operator. The fused operator or a single operator that cannot be fused can be called a fused operator or a fusion layer. The computation process of the neural network is performed layer-by-layer in units of fusion layers, so operator fusion becomes an important process in neural network graph optimization.
一般地,算子融合过程涉及两个步骤:第一定义融合模式,即定义要进行融合的目标;第二确定融合算法,用于匹配神经网络中出现的融合模式,并且互不干扰地进行融合。相关技术中的算子融合方案采用固定的融合模式,并且融合模式与融合算法是一一对应的,即一种融合模式对应于一种融合算法。这使得如果融合模式发生改变,同时也需要改变融合算法来对其进行支持,因而融合模式的扩展和定制化受到限制。Generally, the operator fusion process involves two steps: first, defining the fusion mode, that is, defining the target to be fused; second, determining the fusion algorithm, which is used to match the fusion mode that appears in the neural network and fuse without interfering with each other. The operator fusion scheme in the related art adopts a fixed fusion mode, and the fusion mode and the fusion algorithm are one-to-one corresponding, that is, one fusion mode corresponds to one fusion algorithm. This means that if the fusion mode changes, the fusion algorithm also needs to be changed to support it, so the expansion and customization of the fusion mode is limited.
发明内容Summary of the invention
本公开的一些实施例提供了一种算子融合方法、计算装置、计算设备以及可读存储介质,用于针对各种类型的神经网络结构提供高效的、可扩展的算子融合方案。Some embodiments of the present disclosure provide an operator fusion method, a computing device, a computing equipment, and a readable storage medium for providing an efficient and scalable operator fusion solution for various types of neural network structures.
根据本公开的一方面,提供了一种算子融合方法,应用于包括多个算子的神经网络。该算子融合方法包括:基于算子的类别标签设置至少一个线性
结构的模板融合模式;以及根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合,以融合神经网络中符合至少一个线性结构的模板融合模式的一个或多个算子。According to one aspect of the present disclosure, an operator fusion method is provided, which is applied to a neural network including multiple operators. The operator fusion method comprises: setting at least one linear A template fusion mode of a structure; and according to the respective category labels of the multiple operators in the neural network, the multiple operators in the neural network are fused according to the template fusion mode of at least one linear structure to fuse one or more operators in the neural network that conform to the template fusion mode of at least one linear structure.
根据本公开的一些实施例的算子融合方法还包括:基于算子的功能和/或硬件平台的计算架构特点对算子进行分类,并针对不同类别的算子分配类别标签。According to some embodiments of the present disclosure, the operator fusion method also includes: classifying operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assigning category labels to operators of different categories.
根据本公开的一些实施例,类别标签包括非模板类别以及模板类别,其中,模板类别包括以下的一种或多种:矩阵类别、归一化类别、池化类别、数据重排类别、数据归约类别、回归函数类别以及损失函数类别。According to some embodiments of the present disclosure, category labels include non-template categories and template categories, wherein the template categories include one or more of the following: matrix categories, normalization categories, pooling categories, data rearrangement categories, data reduction categories, regression function categories, and loss function categories.
根据本公开的一些实施例,属于非模板类别的算子在神经网络的计算过程中不消耗寄存器资源或者同步内存资源。According to some embodiments of the present disclosure, operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
根据本公开的一些实施例,至少一个线性结构的模板融合模式由线性连接的一个或多个模板类别组成,其中,基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于模板类别的类别标签且忽略非模板类别的类别标签,以设置至少一个线性结构的模板融合模式。According to some embodiments of the present disclosure, the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, wherein setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
根据本公开的一些实施例的算子融合方法还包括:基于神经网络的网络结构生成有向无环图,其中,有向无环图包括算子以及算子之间的连线,算子之间的连线表征算子之间的数据依赖关系和数据流向,其中,按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合包括:根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式中线性连接的一个或多个模板类别,对神经网络的有向无环图中的算子进行遍历,分别将一个或多个算子匹配为对应的线性结构的模板融合模式,对神经网络中的多个算子进行算子融合。According to some embodiments of the present disclosure, the operator fusion method also includes: generating a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators, wherein, fusing multiple operators in the neural network according to at least one linear structure template fusion mode includes: according to the respective category labels of the multiple operators in the neural network, according to one or more template categories linearly connected in the template fusion mode of at least one linear structure, traversing the operators in the directed acyclic graph of the neural network, matching one or more operators to the corresponding linear structure template fusion mode, respectively, and fusing the multiple operators in the neural network.
根据本公开的一些实施例,基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于算子的类别标签设置线性结构的第一模板融合模式和第二模板融合模式,其中,在第一模板融合模式与第二模板融合模式存在相同部分的模板类别并且第一模板融合模式除相同部分的模板类别之外还包括其他模板类别的情况下,下,先按照第一模板融合模式对神经网络中的多个算子进行算子融合,再按照第二模板融合模式对神经网络中的多个算子进行算子融合。According to some embodiments of the present disclosure, setting a template fusion mode of at least one linear structure based on the category label of an operator includes: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, firstly perform operator fusion on multiple operators in the neural network according to the first template fusion mode, and then perform operator fusion on multiple operators in the neural network according to the second template fusion mode.
根据本公开的一些实施例,基于算子的类别标签设置至少一个线性结构
的模板融合模式包括:利用可配置的文件设置至少一个线性结构的模板融合模式。According to some embodiments of the present disclosure, at least one linear structure is set based on the category label of the operator. The template fusion mode includes: using a configurable file to set at least one linear structure template fusion mode.
根据本公开的一些实施例的算子融合方法还包括:以算子为单位设置子图结构的融合模式;以及基于神经网络的网络结构生成有向无环图,其中,有向无环图包括算子以及算子之间的连线,算子之间的连线表征算子之间的数据依赖关系和数据流向。According to some embodiments of the present disclosure, the operator fusion method also includes: setting a fusion mode of a subgraph structure based on operators; and generating a directed acyclic graph based on the network structure of a neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
根据本公开的一些实施例,在按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合之前,该方法还包括:按照子图结构的融合模式对神经网络的有向无环图中的算子进行算子融合,得到经子图结构算子融合后的图表达。According to some embodiments of the present disclosure, before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, the method also includes: performing operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after operator fusion of the subgraph structure.
根据本公开的一些实施例,按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合包括:根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式,对经子图结构算子融合后的图表达中的算子进行遍历,分别将一个或多个算子匹配为对应的线性结构的模板融合模式,对图表达中的算子进行算子融合。According to some embodiments of the present disclosure, performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure includes: traversing the operators in a graph expression after fusion of sub-graph structure operators according to the respective category labels of the multiple operators in the neural network and according to the template fusion mode of at least one linear structure, respectively matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.
根据本公开的一些实施例,属于非模板类别的算子包括:激活函数、线性整流函数、绝对值函数、加法函数;属于矩阵类别的算子包括:前向卷积算子、反向数据卷积算子、反向滤波器卷积算子、矩阵乘法算子;属于归一化类别的算子包括:批归一化算子、层归一化算子;属于池化类别的算子包括:最大池化层算子、平均池化层算子、全局平均池化层算子;属于数据重排类别的算子包括:拼接算子、变换算子;属于数据归约类别的算子包括:最大值函数、最小值函数、平均值函数;属于回归函数类别的算子包括:针对样点的回归函数、针对信道的回归函数;以及属于损失函数类别的算子包括:均方误差函数、交叉熵函数。According to some embodiments of the present disclosure, operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, addition function; operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator; operators belonging to the normalization category include: batch normalization operator, layer normalization operator; operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator; operators belonging to the data rearrangement category include: splicing operator, transformation operator; operators belonging to the data reduction category include: maximum function, minimum function, average function; operators belonging to the regression function category include: regression function for sample points, regression function for channels; and operators belonging to the loss function category include: mean square error function, cross entropy function.
根据本公开的另一方面,提供了一种计算装置,用于对神经网络进行算子融合,其中,神经网络包括多个算子。该计算装置包括:融合模式配置单元,配置成:基于算子的类别标签设置至少一个线性结构的模板融合模式;以及融合单元,配置成根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合,以融合神经网络中符合至少一个线性结构的模板融合模式的一个或多个算子。According to another aspect of the present disclosure, a computing device is provided for performing operator fusion on a neural network, wherein the neural network includes multiple operators. The computing device includes: a fusion mode configuration unit configured to: set at least one linear structure template fusion mode based on the category label of the operator; and a fusion unit configured to perform operator fusion on the multiple operators in the neural network according to the template fusion mode of at least one linear structure according to the category labels of the multiple operators in the neural network, so as to fuse one or more operators in the neural network that conform to the template fusion mode of at least one linear structure.
根据本公开的一些实施例的计算装置还包括分类单元,配置成:基于算
子的功能和/或硬件平台的计算架构特点对算子进行分类,并针对不同类别的算子分配类别标签,其中,类别标签包括非模板类别以及模板类别,模板类别包括以下的一种或多种:矩阵类别、归一化类别、池化类别、数据重排类别、数据归约类别、回归函数类别以及损失函数类别。According to some embodiments of the present disclosure, the computing device further includes a classification unit configured to: The operators are classified according to the functions of the sub-operators and/or the computing architecture characteristics of the hardware platform, and category labels are assigned to different categories of operators, wherein the category labels include non-template categories and template categories, and the template categories include one or more of the following: matrix categories, normalization categories, pooling categories, data rearrangement categories, data reduction categories, regression function categories, and loss function categories.
根据本公开的一些实施例,属于非模板类别的算子在神经网络的计算过程中不消耗寄存器资源或者同步内存资源。According to some embodiments of the present disclosure, operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
根据本公开的一些实施例,至少一个线性结构的模板融合模式由线性连接的一个或多个模板类别组成,基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于模板类别的类别标签且忽略非模板类别的类别标签,以设置至少一个线性结构的模板融合模式。According to some embodiments of the present disclosure, the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, and setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
根据本公开的一些实施例的计算装置还包括生成单元,配置成:基于神经网络的网络结构生成有向无环图,其中,有向无环图包括算子以及算子之间的连线,算子之间的连线表征算子之间的数据依赖关系和数据流向。According to some embodiments of the present disclosure, the computing device also includes a generating unit configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between the operators.
根据本公开的一些实施例,融合单元按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合包括:根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式,对神经网络的有向无环图中的算子进行遍历,分别将一个或多个算子匹配为对应的所述线性结构的模板融合模式,对所述神经网络中的所述多个算子进行算子融合。According to some embodiments of the present disclosure, a fusion unit performs operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, including: traversing the operators in a directed acyclic graph of the neural network according to the template fusion mode of at least one linear structure according to the respective category labels of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the multiple operators in the neural network.
根据本公开的一些实施例,融合模式配置单元基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于算子的类别标签设置线性结构的第一模板融合模式和第二模板融合模式,其中,在第一模板融合模式与第二模板融合模式存在相同部分的模板类别并且第一模板融合模式除相同部分的模板类别之外还包括其他模板类别的情况下,融合单元先按照第一模板融合模式对神经网络中的多个算子进行算子融合,再按照第二模板融合模式对神经网络中的多个算子进行算子融合。According to some embodiments of the present disclosure, a fusion mode configuration unit sets a template fusion mode of at least one linear structure based on the category label of the operator, including: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, the fusion unit first performs operator fusion on multiple operators in the neural network according to the first template fusion mode, and then performs operator fusion on multiple operators in the neural network according to the second template fusion mode.
根据本公开的一些实施例,融合模式配置单元基于算子的类别标签设置至少一个线性结构的模板融合模式包括:利用可配置的文件设置至少一个线性结构的模板融合模式。According to some embodiments of the present disclosure, the fusion mode configuration unit sets the template fusion mode of at least one linear structure based on the category label of the operator, including: setting the template fusion mode of at least one linear structure using a configurable file.
根据本公开的一些实施例,融合模式配置单元还配置成:以算子为单位设置子图结构的融合模式。根据本公开的一些实施例的计算装置还包括生成单元,配置成:基于神经网络的网络结构生成有向无环图,其中,有向无环
图包括算子以及算子之间的连线,算子之间的连线表征算子之间的数据依赖关系和数据流向。According to some embodiments of the present disclosure, the fusion mode configuration unit is further configured to: set the fusion mode of the subgraph structure in units of operators. According to some embodiments of the present disclosure, the computing device further includes a generating unit configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph The graph includes operators and the connections between operators. The connections between operators represent the data dependencies and data flows between operators.
根据本公开的一些实施例,在按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合之前,融合单元还配置成:按照子图结构的融合模式对神经网络的有向无环图中的算子进行算子融合,得到经子图结构算子融合后的图表达,其中,融合单元按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合包括:根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式,对经子图结构算子融合后的图表达中的算子进行遍历,分别将一个或多个算子匹配为对应的线性结构的模板融合模式,对图表达中的算子进行算子融合。According to some embodiments of the present disclosure, before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, the fusion unit is further configured to: perform operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure, wherein the fusion unit performs operator fusion on multiple operators in the neural network according to a template fusion mode of at least one linear structure, including: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.
根据本公开的一些实施例的计算装置,其中,属于非模板类别的算子包括:激活函数、线性整流函数、绝对值函数、加法函数;属于矩阵类别的算子包括:前向卷积算子、反向数据卷积算子、反向滤波器卷积算子、矩阵乘法算子;属于归一化类别的算子包括:批归一化算子、层归一化算子;属于池化类别的算子包括:最大池化层算子、平均池化层算子、全局平均池化层算子;属于数据重排类别的算子包括:拼接算子、变换算子;属于数据归约类别的算子包括:最大值函数、最小值函数、平均值函数;属于回归函数类别的算子包括:针对样点的回归函数、针对信道的回归函数;以及属于损失函数类别的算子包括:均方误差函数、交叉熵函数。According to the computing device of some embodiments of the present disclosure, the operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, addition function; the operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator; the operators belonging to the normalization category include: batch normalization operator, layer normalization operator; the operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator; the operators belonging to the data rearrangement category include: splicing operator, transformation operator; the operators belonging to the data reduction category include: maximum function, minimum function, average function; the operators belonging to the regression function category include: regression function for sample points, regression function for channels; and the operators belonging to the loss function category include: mean square error function, cross entropy function.
根据本公开的又一方面,提供了一种计算设备,包括:处理器;和存储器,其中,存储器中存储有计算机可读代码,计算机可读代码在由处理器运行时,执行如上所述的算子融合方法。According to another aspect of the present disclosure, a computing device is provided, including: a processor; and a memory, wherein the memory stores a computer-readable code, and when the computer-readable code is executed by the processor, the operator fusion method as described above is executed.
根据本公开的又一方面,提供了一种非暂时性计算机可读存储介质,其上存储有指令,该指令在被处理器执行时实现如上所述的算子融合方法。According to another aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, on which instructions are stored. When the instructions are executed by a processor, the operator fusion method described above is implemented.
利用本公开一些实施例提供的算子融合方法、计算装置、计算设备和存储介质,能够基于算子的类别标签设置线性结构的模板融合模式,以及提供对于线性结构的模板融合模式普遍适用的算子融合算法,以使得设计的模板融合模式具有通用性以及可扩展性并能够以算子的类别为单位进行融合,从而提供高效的神经网络的算子融合过程,有利于神经网络的图优化,提高神经网络在硬件平台的计算效率。
By utilizing the operator fusion method, computing device, computing equipment and storage medium provided by some embodiments of the present disclosure, it is possible to set a template fusion mode of a linear structure based on the category label of the operator, and provide an operator fusion algorithm that is generally applicable to the template fusion mode of the linear structure, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process for the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform.
为了更清楚地说明本公开实施例,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present disclosure. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.
图1A示出了神经网络计算过程的示意图;FIG1A shows a schematic diagram of a neural network calculation process;
图1B示出了相关技术中神经网络算子融合的示意图;FIG1B shows a schematic diagram of neural network operator fusion in the related art;
图2示出了根据本公开一些实施例的算子融合方法的流程示意图;FIG2 shows a schematic flow chart of an operator fusion method according to some embodiments of the present disclosure;
图3示出了根据本公开一些实施例的算子分类示意图;FIG3 shows a schematic diagram of operator classification according to some embodiments of the present disclosure;
图4示出了根据本公开一些实施例的模板融合模式的示意图;FIG4 shows a schematic diagram of a template fusion mode according to some embodiments of the present disclosure;
图5A示出了神经网络的有向无环图的示意图;FIG5A shows a schematic diagram of a directed acyclic graph of a neural network;
图5B示出了对图5A所示的有向无环图进行算子融合的示意图;FIG5B is a schematic diagram showing an operation fusion of the directed acyclic graph shown in FIG5A ;
图5C示出了经融合后的有向无环图;FIG5C shows the directed acyclic graph after fusion;
图6示出了根据本公开一些实施例的模板融合模式的融合优先级示意图;FIG6 shows a schematic diagram of fusion priority of a template fusion mode according to some embodiments of the present disclosure;
图7示出了根据本公开一些实施例的算子融合方案示意框图;FIG7 shows a schematic block diagram of an operator fusion solution according to some embodiments of the present disclosure;
图8A-图8B示出了根据本公开一些实施例的算子融合方案流程示意图;8A-8B show a schematic diagram of an operator fusion solution process according to some embodiments of the present disclosure;
图9示出了根据本公开一些实施例的计算装置的示意性框图;FIG9 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure;
图10示出了根据本公开一些实施例的计算设备的示意性框图;FIG10 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure;
图11示出了根据本公开一些实施例的示例性计算设备的架构的示意图;FIG11 is a schematic diagram showing the architecture of an exemplary computing device according to some embodiments of the present disclosure;
图12示出了根据本公开一些实施例的计算机可读存储介质的示意图。FIG. 12 shows a schematic diagram of a computer-readable storage medium according to some embodiments of the present disclosure.
下面将结合本公开实施例中的附图,对本公开实施例进行清楚、完整地描述。显然,所描述的实施例仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。The following will be combined with the drawings in the embodiments of the present disclosure to clearly and completely describe the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present disclosure.
此外,如本公开和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物
件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。In addition, as shown in the present disclosure and claims, unless the context clearly indicates an exception, the words "a", "an", "an" and/or "the" do not specifically refer to the singular, but also include the plural. The words "first", "second" and similar words used in the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Similarly, words such as "include" or "comprises" mean that the elements or objects appearing before the word include the elements or objects listed after the word. "Connect" or "connected" and similar terms are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
本公开中使用了流程图来说明根据本公开的实施例的方法的步骤。应当理解的是,前面或后面的步骤不一定按照顺序来精确的进行。相反,可以按照倒序或同时处理各种步骤。同时,也可以将其他操作添加到这些过程中。Flowcharts are used in this disclosure to illustrate the steps of the method according to the embodiments of the present disclosure. It should be understood that the preceding or following steps are not necessarily performed precisely in order. On the contrary, various steps may be processed in reverse order or simultaneously. At the same time, other operations may also be added to these processes.
可以理解的是,本公开中涉及的专业术语、名词具有本领域技术人员所公知的含义。It is to be understood that the professional terms and nouns involved in the present disclosure have meanings well known to those skilled in the art.
人工神经网络(Artificial Neural Networks),简称为神经网络,是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间相互连接的关系,从而达到处理信息的目的。不论何种类型的神经网络,它们共同的特点是大规模并行处理、分布式存储、弹性拓扑、高度冗余和非线性运算等,具有运算速度、联想能力、适应性、容错能力和自组织能力等方面的能力。这些特点和能力构成了神经网络模拟智能活动的技术基础,并在各种技术领域获得了重要的应用。例如,神经网络可以用于数据压缩、图像处理、视频编码、信号处理等应用领域。Artificial Neural Networks (ANN), referred to as neural network, is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This network relies on the complexity of the system to adjust the interconnected relationships between a large number of internal nodes to achieve the purpose of processing information. Regardless of the type of neural network, their common characteristics are large-scale parallel processing, distributed storage, elastic topology, high redundancy and nonlinear operations, and have capabilities in terms of computing speed, associative ability, adaptability, fault tolerance and self-organization. These characteristics and capabilities constitute the technical basis for neural networks to simulate intelligent activities and have been used in various technical fields. For example, neural networks can be used in data compression, image processing, video coding, signal processing and other application fields.
为了提高神经网络的计算效率,通常会将神经网络中满足一定条件或规则的相邻多个算子(operator)进行融合,形成经融合的算子(fused operator)。经融合的算子或者无法进行融合的单个算子可以表示为一个融合层(fusion layer),或者简称为层。神经网络的计算过程通常以融合层为单位、逐层(layer-by-layer)进行。一般情形下,前一层(或前几层)的输出作为后一层(或者后几层)的输入,因而形成融合层之间的数据依赖(data dependency)。示意性地,图1A示出了神经网络计算过程的示意图,如图1A所示,对于用于神经网络的计算单元,一般细化为矩阵乘法单元、矢量运算单元以及标量运算单元,分别用于进行不同的计算任务。在计算单元进行计算的过程中,需要通过共享内存用数据同步的方法来保证有数据依赖关系的融合层之间的执行顺序。共享内存可以是基于内部存储器(internal memory)或外部存储器(external memory),在此不作限制。In order to improve the computational efficiency of a neural network, adjacent multiple operators that meet certain conditions or rules in a neural network are usually fused to form a fused operator. A fused operator or a single operator that cannot be fused can be represented as a fusion layer, or simply a layer. The computational process of a neural network is usually performed layer-by-layer in units of fusion layers. In general, the output of the previous layer (or the previous layers) is used as the input of the next layer (or the next layers), thereby forming data dependencies between fusion layers. Schematically, FIG1A shows a schematic diagram of the neural network computational process. As shown in FIG1A, for a computational unit used for a neural network, it is generally refined into a matrix multiplication unit, a vector operation unit, and a scalar operation unit, which are used to perform different computational tasks, respectively. In the process of the computational unit performing the computation, it is necessary to use a data synchronization method through a shared memory to ensure the execution order between fusion layers with data dependencies. The shared memory can be based on an internal memory or an external memory, which is not limited here.
由此,在以上图1A所示的计算过程之前进行的算子融合过程的高效、灵活实现对神经网络的计算效率具有重要影响。此外,针对不同且复杂的神
经网络内部结构,需要快速调整并定义融合模式并适配对应的融合算法,以实现神经网络中的算子融合。Therefore, the efficient and flexible implementation of the operator fusion process performed before the calculation process shown in FIG. 1A above has an important impact on the computational efficiency of the neural network. Through the internal structure of the network, it is necessary to quickly adjust and define the fusion mode and adapt the corresponding fusion algorithm to realize the operator fusion in the neural network.
在相关技术中,大多数的算子融合方案都是采用固定模式或固定规则的融合模式(即,定义要进行融合的目标算子)。图1B示出了相关技术中神经网络算子融合的示意图,其中示意性地示出了4种固定类型的融合模式1-融合模式4。基于图1B中定义的4种融合模式,需要适配对应的4种融合算法1-融合算法4,以在神经网络的网络结构中匹配找到符合以上融合模式1-融合模式4的算子并进行融合,从而得到融合后的网络结构图,以基于融合后的网络结构图进行后续的计算过程。作为示例,在图1B所示的融合模式1中,其定义了一种直线型的融合目标(包括算子1、算子2和算子3),在基于该融合模式1进行算子融合的过程中,需要设计融合算法1以在神经网络中包括的诸多算子中匹配到符合以上融合模式1中的算子1-算子2-算子3的结构,并对其进行融合。在相关技术中,通常采用子图匹配的方式来实现融合算法,即设计与融合模式相对应的子图,并在神经网络的算子结构图中进行遍历以找到对应的子图结构。可以理解的是,在图1B示出的融合模式均是以算子为单位进行的。In the related art, most operator fusion schemes adopt a fixed mode or a fixed rule fusion mode (i.e., define the target operator to be fused). FIG1B shows a schematic diagram of neural network operator fusion in the related art, in which four fixed types of fusion modes 1-fusion mode 4 are schematically shown. Based on the four fusion modes defined in FIG1B, it is necessary to adapt the corresponding four fusion algorithms 1-fusion algorithm 4 to match and find operators that meet the above fusion modes 1-fusion modes 4 in the network structure of the neural network and fuse them, so as to obtain a fused network structure diagram, so as to perform subsequent calculation processes based on the fused network structure diagram. As an example, in the fusion mode 1 shown in FIG1B, a linear fusion target (including operator 1, operator 2 and operator 3) is defined. In the process of performing operator fusion based on the fusion mode 1, it is necessary to design a fusion algorithm 1 to match the structure of operator 1-operator 2-operator 3 in the above fusion mode 1 among the many operators included in the neural network, and fuse them. In the related art, a subgraph matching method is usually used to implement the fusion algorithm, that is, a subgraph corresponding to the fusion mode is designed, and the operator structure diagram of the neural network is traversed to find the corresponding subgraph structure. It can be understood that the fusion modes shown in FIG1B are all performed in units of operators.
通常情形下,对于功能结构较为复杂的神经网络,可能需要定义十几种甚至几十种融合模式,这使得算子融合步骤需要占用较多的计算资源以及调试成本。从图1B可以看出,定义的融合模式相对于融合算法是静态存在的,融合算法和融合模式本身是直接相关的,例如,一种融合模式就对应于一种融合算法。如果融合模式发生改变,同时也需要改变融合算法来对其进行支持,因而融合模式的扩展和定制化受到限制,不具有普遍适用性,且融合算法也需要相应不断地进行调整。此外,高效的融合模式还需要考虑运行神经网络的硬件平台的架构特点来进行确定,一旦切换硬件平台也就意味着融合模式也将随之改变,而融合模式的变化也导致需要重新制定并编译相应的融合算法。Under normal circumstances, for neural networks with more complex functional structures, it may be necessary to define more than a dozen or even dozens of fusion modes, which requires the operator fusion step to occupy more computing resources and debugging costs. As can be seen from Figure 1B, the defined fusion mode is static relative to the fusion algorithm, and the fusion algorithm and the fusion mode itself are directly related. For example, a fusion mode corresponds to a fusion algorithm. If the fusion mode changes, the fusion algorithm also needs to be changed to support it. Therefore, the expansion and customization of the fusion mode are limited, and it is not universally applicable, and the fusion algorithm also needs to be adjusted accordingly. In addition, an efficient fusion mode also needs to be determined by considering the architectural characteristics of the hardware platform running the neural network. Once the hardware platform is switched, it means that the fusion mode will also change accordingly, and the change in the fusion mode also requires the need to re-formulate and compile the corresponding fusion algorithm.
本公开应用于神经网络推理或训练的高效计算领域,解决神经网络的图优化过程中的算子融合问题,以融合模式的通用性和可扩展性为目标,提供融合模式可变且融合算法固定的算子融合策略。The present invention is applied to the field of efficient computing of neural network reasoning or training, solves the operator fusion problem in the graph optimization process of neural network, takes the universality and scalability of fusion mode as the goal, and provides an operator fusion strategy with variable fusion mode and fixed fusion algorithm.
具体的,本公开的一些实施例提供了一种算子融合方法,用于针对各种类型的神经网络结构进行算子融合,设计了基于算子的类别标签设置的线性
结构的模板融合模式,以及对应的算子融合算法,以使得设计的模板融合模式具有通用性以及可扩展性并能够以算子的类别为单位进行融合,从而提供高效的神经网络的算子融合过程,有利于神经网络的图优化,提高神经网络在硬件平台的计算效率。下面将结合附图来详细描述根据本公开一些实施例的算法融合方法的实现过程。Specifically, some embodiments of the present disclosure provide an operator fusion method for performing operator fusion on various types of neural network structures, and design a linear operator fusion method based on the category label setting of the operator. The template fusion mode of the structure and the corresponding operator fusion algorithm are designed so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform. The implementation process of the algorithm fusion method according to some embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
图2示出了根据本公开一些实施例的算子融合方法的流程示意图,如图2所示,该算子融合方法包括步骤S101和S102。具体的,在步骤S101,基于算子的类别标签设置至少一个线性结构的模板融合模式。在根据本公开的实施例中,提供了一种模板型的融合模式,并且是针对算子的类别标签来进行定义的,相比较于如图1B所示的以算子为单位的融合模式具有更高是适应性地,只要是符合该类别的算子均可以得到融合,这可以显著降低需要进行融合的融合模式的数目。接着,在步骤S102,根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合,以融合神经网络中符合上述至少一个线性结构的模板融合模式的一个或多个算子。接下来,将首先描述如何生成步骤S101中的模板融合模式,再对与之对应的融合算法,即步骤S102的实现过程进行描述。FIG2 shows a flow chart of an operator fusion method according to some embodiments of the present disclosure. As shown in FIG2, the operator fusion method includes steps S101 and S102. Specifically, in step S101, at least one linear structure template fusion mode is set based on the category label of the operator. In an embodiment according to the present disclosure, a template-type fusion mode is provided, and it is defined for the category label of the operator. Compared with the fusion mode in which the operator is the unit as shown in FIG1B, it has higher adaptability. As long as the operator conforms to the category, it can be fused, which can significantly reduce the number of fusion modes that need to be fused. Then, in step S102, according to the category labels of the multiple operators in the neural network, the multiple operators in the neural network are fused according to the template fusion mode of at least one linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the above-mentioned at least one linear structure. Next, how to generate the template fusion mode in step S101 will be described first, and then the corresponding fusion algorithm, that is, the implementation process of step S102 will be described.
根据本公开的一些实施例的算子融合方法还包括:基于算子的功能和/或硬件平台的计算架构特点对算子进行分类,并针对不同类别的算子分配类别标签。According to some embodiments of the present disclosure, the operator fusion method also includes: classifying operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assigning category labels to operators of different categories.
为了定义模板融合模式,需要提前制定算子的分类规则,以对神经网络所可能包括的各种算子进行分类,将众多的算子分为若干个的类别。一般地,算子的分类需要结合其功能和/或硬件平台的架构特点,例如,不同类别的计算功能通常由不同的计算单元执行,如矩阵运算、矢量运算、标量运算、特殊功能单元(SFU)等等,这样的多个算子融合到一起,可以进行流水线(pipeline)并行,而同类计算功能的多个算子如果不需要消耗额外的寄存器资源或者同步资源,可以融合到一起,在计算单元中串行执行、避免数据转存和内存操作的时间消耗。In order to define the template fusion mode, it is necessary to formulate operator classification rules in advance to classify the various operators that may be included in the neural network and divide the numerous operators into several categories. Generally, the classification of operators needs to be combined with their functions and/or the architectural characteristics of the hardware platform. For example, different categories of computing functions are usually performed by different computing units, such as matrix operations, vector operations, scalar operations, special function units (SFUs), etc. Such multiple operators can be fused together to perform pipeline parallelism, and multiple operators of the same computing function can be fused together if they do not need to consume additional register resources or synchronization resources, and executed serially in the computing unit to avoid time consumption of data transfer and memory operations.
作为示例,图3示出了根据本公开一些实施例的算子分类示意图,可以理解的是,图3中仅给出了关于算子分类的一些示例,神经网络的算子还可以包括图3中未示出的算子以及其他类别,在此不作限制。此外,在根据本
公开的其他实施例中,还可以针对神经网络中的算子定义其他的分类方式等。As an example, FIG3 shows a schematic diagram of operator classification according to some embodiments of the present disclosure. It can be understood that FIG3 only gives some examples of operator classification, and the operators of the neural network may also include operators not shown in FIG3 and other categories, which are not limited here. In other disclosed embodiments, other classification methods may be defined for operators in a neural network.
如图3所示,根据本公开的一些实施例,以上类别标签可以包括非模板类别以及模板类别,其中,属于非模板类别的算子在神经网络的计算过程中不消耗寄存器资源或者同步内存资源。对于属于非模板类别的算子,在计算过程中,通常一个样点经过计算还是一个样点,不会占用额外计算资源,由此,在本公开的一些实施例中,单独地设置此非模板类别的算子,以在模板匹配时不计入此类别的算子。如图3所示,属于非模板类别的算子例如可以是Element Wise算子,例如可以包括:激活函数(Sigmoid、Swish)、线性整流函数(Relu)、绝对值函数(Abs)、加法函数(Add)等,在此不再一一列举。以上属于非模板类别的算子在神经网络的结构中是普遍存在的。此外,可以理解的是,属于非模板类别的算子还可以除Element Wise算子之外的其他算子,在此不再举例。根据本公开的一些实施例,上述线性结构的模板融合模式由线性(Linear)连接的一个或多个模板类别组成,也就是说,在根据本公开实施例的模板融合模式中仅包括模板类别的算子而不包括以上属于非模板类别的算子。由此,在根据本公开实施例设计的模板融合模式中消除了此类不会占用额外计算资源的算子对于融合模式的影响,这有利于进一步降低模板融合模式的复杂度,并且更具有普遍适用性。As shown in FIG3, according to some embodiments of the present disclosure, the above category labels may include non-template categories and template categories, wherein the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network. For operators belonging to the non-template category, during the calculation process, usually a sample point is still a sample point after calculation, and no additional computing resources are occupied. Therefore, in some embodiments of the present disclosure, the operators of this non-template category are set separately so as not to include operators of this category in template matching. As shown in FIG3, the operators belonging to the non-template category may be, for example, Element Wise operators, which may include: activation functions (Sigmoid, Swish), linear rectification functions (Relu), absolute value functions (Abs), addition functions (Add), etc., which are not listed one by one here. The above operators belonging to the non-template category are ubiquitous in the structure of the neural network. In addition, it can be understood that the operators belonging to the non-template category may also be other operators besides Element Wise operators, which are not given as examples here. According to some embodiments of the present disclosure, the template fusion mode of the above linear structure is composed of one or more template categories connected linearly, that is, the template fusion mode according to the embodiment of the present disclosure only includes operators of the template category and does not include the above operators belonging to the non-template category. Therefore, in the template fusion mode designed according to the embodiment of the present disclosure, the influence of such operators that do not occupy additional computing resources on the fusion mode is eliminated, which is conducive to further reducing the complexity of the template fusion mode and has more universal applicability.
接着,如图3所示,根据本公开实施例的模板类别可以包括以下中的一种或多种:矩阵类别(Matrix)、归一化类别(Normalize)、池化类别(Pooling)、数据重排类别(Reorder)、数据归约类别(Reduce)、回归函数类别(Softmax)以及损失函数类别(Loss)等,在此不再一一列举。Next, as shown in Figure 3, according to the embodiment of the present disclosure, the template category may include one or more of the following: matrix category (Matrix), normalization category (Normalize), pooling category (Pooling), data rearrangement category (Reorder), data reduction category (Reduce), regression function category (Softmax) and loss function category (Loss), etc., which are not listed one by one here.
作为示例,属于矩阵类别(Matrix)的算子可以包括:前向卷积算子(Forward conv)、反向数据卷积算子(Backward data conv)、反向滤波器卷积算子(Backward filter conv)、矩阵乘法算子(Matrix Multiplication,MatMul)等。属于归一化类别(Normalize)的算子可以包括:批归一化算子(Batch norm)、层归一化算子(Layer norm)等。属于池化类别(Pooling)的算子可以包括:最大池化层算子(Max pooling)、平均池化层算子(Average pooling)、全局平均池化层算子(Global average pooling)等。属于数据重排类别(Reorder)的算子可以包括:拼接算子(Concate)、变换算子(Pemute)、变形算子(Reshape)、切割算子(Slice)等。属于数据归约类别(Reduce)的算子可以包括:最大值函数(Max)、最小值函数(Min)、
平均值函数(Average)以及求和函数(Sum)等。属于回归函数类别(Softmax)的算子可以包括:针对样点的回归函数(Softmax on sample)、针对信道的回归函数(Softmax on channel)等。属于损失函数类别(Loss)的算子可以包括:均方误差函数(Mean Square Error,MSE)、交叉熵函数(Cross-entropy)等。As an example, operators belonging to the matrix category (Matrix) may include: forward convolution operator (Forward conv), backward data convolution operator (Backward data conv), backward filter convolution operator (Backward filter conv), matrix multiplication operator (Matrix Multiplication, MatMul), etc. Operators belonging to the normalization category (Normalize) may include: batch normalization operator (Batch norm), layer normalization operator (Layer norm), etc. Operators belonging to the pooling category (Pooling) may include: maximum pooling layer operator (Max pooling), average pooling layer operator (Average pooling), global average pooling layer operator (Global average pooling), etc. Operators belonging to the data rearrangement category (Reorder) may include: concatenation operator (Concate), transformation operator (Pemute), deformation operator (Reshape), cutting operator (Slice), etc. Operators belonging to the data reduction category (Reduce) may include: maximum function (Max), minimum function (Min), Average function (Average) and sum function (Sum), etc. Operators belonging to the regression function category (Softmax) may include: regression function for sample points (Softmax on sample), regression function for channel (Softmax on channel), etc. Operators belonging to the loss function category (Loss) may include: mean square error function (MSE), cross entropy function (Cross-entropy), etc.
可以理解的是,在根据本公开的实施例中,首先对神经网络的算子进行了分类,然后再以类别为单位来定义模板融合模式,而非以算子本身进行融合,这有利于减少神经网络所需的融合模式的数目(这将在下文的描述中体现出来)。此外,在以上以类别为单位定义的模板融合模式中,还剔除了不会占用额外计算资源的非模板类别的算子对于融合模式的影响,即,在定义的模板融合模式中不包括此类算子,这有利于进一步降低模板融合模式的复杂度,并且更具有普遍适用性。It is understandable that in the embodiment according to the present disclosure, the operators of the neural network are first classified, and then the template fusion mode is defined in units of categories, rather than fusing the operators themselves, which is conducive to reducing the number of fusion modes required for the neural network (this will be reflected in the description below). In addition, in the template fusion mode defined above in units of categories, the influence of operators of non-template categories that do not occupy additional computing resources on the fusion mode is also eliminated, that is, such operators are not included in the defined template fusion mode, which is conducive to further reducing the complexity of the template fusion mode and is more generally applicable.
作为示例,图4示出了根据本公开一些实施例的模板融合模式的示意图。在图4中,左侧示意性地示出了4组示例算子图,在根据本公开实施例提供的算子融合方法中,此4组示例算子图均可以表征为图4中右侧所示的模板融合模式,即一个矩阵类别(Matrix)算子连接一个归一化类别(Normalize)算子。也就是说,在定义了右侧所示的模板融合模式的情况下,神经网络中出现的左侧所示的4种示例算子图均符合该模板融合模式,并由此进行算子融合,此外,可以理解的是,该模板融合模式还可以概括除图4左侧所示的其他形式的算子连接方式,即,在算子融合过程中,只要是满足该模板融合模式的算子均将被融合以形成经融合的算子,即融合层。这充分体现出本公开实施例提供的模板融合模式在普遍适用性方面的优势,即具有模板的属性,从而减少神经网络所需的融合模式的数目。相比较地,在相关技术中,如果对图4左侧所示的4种算子连接形式进行融合需要分别定义4种融合模式以及对应地融合算法。As an example, FIG4 shows a schematic diagram of a template fusion mode according to some embodiments of the present disclosure. In FIG4, four groups of example operator graphs are schematically shown on the left. In the operator fusion method provided according to the embodiment of the present disclosure, these four groups of example operator graphs can be characterized as the template fusion mode shown on the right side of FIG4, that is, a matrix category (Matrix) operator is connected to a normalization category (Normalize) operator. That is, when the template fusion mode shown on the right side is defined, the four example operator graphs shown on the left side appearing in the neural network all conform to the template fusion mode, and thus perform operator fusion. In addition, it can be understood that the template fusion mode can also summarize other forms of operator connection methods other than those shown on the left side of FIG4, that is, in the operator fusion process, as long as the operator that satisfies the template fusion mode will be fused to form a fused operator, that is, a fusion layer. This fully reflects the advantages of the template fusion mode provided by the embodiment of the present disclosure in terms of universal applicability, that is, it has the attributes of a template, thereby reducing the number of fusion modes required for the neural network. In comparison, in the related art, if the four operator connection forms shown on the left side of FIG. 4 are to be fused, four fusion modes and corresponding fusion algorithms need to be defined respectively.
根据本公开一些实施例的算子融合方法还可以包括:基于神经网络的网络结构生成有向无环图,其中,有向无环图包括算子以及算子之间的连线,算子之间的连线表征算子之间的数据依赖关系和数据流向。According to some embodiments of the present disclosure, the operator fusion method may further include: generating a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
神经网络可以看作由很多算子(也可以称为计算节点)组成的有向无环图(Directed Acyclic Graph,DAG),DAG中的每个节点对应于神经网络中的一个算子(operator)。可以理解的是,可以根据任意已知的方式将神经网
络结构抽象为有向无环图DAG,在此不再详细描述。具体的,图5A示出了神经网络的有向无环图的示意图。如图5A所示,其中示出了一个包括13个算子的神经网络所形成的有向无环图,此外,算子之间由连线连接,该连线可以用于表征算子之间的数据依赖关系和数据流向。例如,算子1的输出数据流向算子2,算子2的输出数据流向算子3和算子6,依次类推,由此可以确定得到算子1与算子2具有数据依赖关系,算子2与算子6以及算子3具有数据依赖关系。可以理解的是,图5A中示出的网络结构仅是示意性的,根据本公开实施例的算子融合方法可以应用于各种类型的神经网络结构。A neural network can be viewed as a directed acyclic graph (DAG) consisting of many operators (also called computing nodes), and each node in the DAG corresponds to an operator in the neural network. It is understandable that a neural network can be constructed in any known manner. The network structure is abstracted as a directed acyclic graph DAG, which is not described in detail here. Specifically, FIG. 5A shows a schematic diagram of a directed acyclic graph of a neural network. As shown in FIG. 5A , a directed acyclic graph formed by a neural network including 13 operators is shown. In addition, the operators are connected by lines, which can be used to characterize the data dependencies and data flows between the operators. For example, the output data of operator 1 flows to operator 2, and the output data of operator 2 flows to operator 3 and operator 6, and so on. It can be determined that operator 1 has a data dependency relationship with operator 2, and operator 2 has a data dependency relationship with operator 6 and operator 3. It can be understood that the network structure shown in FIG. 5A is only schematic, and the operator fusion method according to the embodiment of the present disclosure can be applied to various types of neural network structures.
根据本公开实施例,在按照如上所描述的过程定义了线性结构的模板融合模式(例如,如图4右侧所示的模式:Matrix-Normalize)之后,可以基于神经网络的有向无环图来进行算子融合过程。According to an embodiment of the present disclosure, after defining a template fusion mode of a linear structure according to the process described above (for example, the mode shown on the right side of FIG. 4 : Matrix-Normalize), an operator fusion process can be performed based on a directed acyclic graph of a neural network.
根据本公开的一些实施例,上述至少一个线性结构的模板融合模式由线性连接的一个或多个模板类别组成,其中,基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于模板类别的类别标签且忽略非模板类别的类别标签,以设置至少一个线性结构的模板融合模式。According to some embodiments of the present disclosure, the template fusion mode of the at least one linear structure is composed of one or more template categories that are linearly connected, wherein setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
根据本公开实施例,在步骤S102中按照线性结构的模板融合模式对神经网络中的多个算子进行算子融合包括:根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式,对神经网络的有向无环图中的算子进行遍历,分别将一个或多个算子匹配为对应的线性结构的模板融合模式,对神经网络中的多个算子进行算子融合。According to an embodiment of the present disclosure, performing operator fusion on multiple operators in a neural network according to a template fusion mode of a linear structure in step S102 includes: traversing the operators in a directed acyclic graph of the neural network according to at least one template fusion mode of a linear structure according to respective category labels of the multiple operators in the neural network, matching one or more operators to corresponding template fusion modes of a linear structure, and performing operator fusion on multiple operators in the neural network.
具体的,图5B示出了按照图4所示的模板融合模式对图5A所示的有向无环图进行算子融合的示意图。在根据本公开的实施例中,由于定义的模板融合模式是以算子的类别为单位进行的,由此,在算子融合的过程中,也需要根据神经网络中的多个算子各自的类别标签进行融合。作为示例且便于描述,假设图5A中所示的算子1-13的信息如以下表1所示:Specifically, FIG. 5B shows a schematic diagram of performing operator fusion on the directed acyclic graph shown in FIG. 5A according to the template fusion mode shown in FIG. 4 . In the embodiment according to the present disclosure, since the defined template fusion mode is performed in units of operator categories, in the process of operator fusion, it is also necessary to perform fusion according to the category labels of the respective operators in the neural network. As an example and for ease of description, it is assumed that the information of operators 1-13 shown in FIG. 5A is as shown in the following Table 1:
表1
Table 1
Table 1
参考以上表1所示的信息可知,在如图5A所示的有向无环图中,可以匹配得到3个融合算子,即按照模板融合模式:Matrix-Normalize可以进行3次算子融合,换句话说,在图5B中,融合算子A、B和C均符合模板融合模式Matrix-Normalize,从而被融合,图5C示出了经融合后的图表达(Graph Representation)。可以理解的是,表1所列出的算子仅是示意性的,用于描述根据定义的模板融合模式对有向无环图进行算子融合的过程。Referring to the information shown in Table 1 above, it can be seen that in the directed acyclic graph shown in Figure 5A, three fusion operators can be matched, that is, according to the template fusion mode: Matrix-Normalize, three operator fusions can be performed. In other words, in Figure 5B, the fusion operators A, B and C all meet the template fusion mode Matrix-Normalize, and are thus fused. Figure 5C shows the fused graph representation. It can be understood that the operators listed in Table 1 are only schematic and are used to describe the process of performing operator fusion on a directed acyclic graph according to the defined template fusion mode.
在根据本公开实施例提供的算子融合方法中,能够基于算子的类别标签来定义线性结构的模板融合模式,以及对应的算子融合算法,以使得设计的模板融合模式具有通用性以及可扩展性并能够以算子的类别为单位进行融合,从而提供高效的神经网络的算子融合过程。这种算子融合的过程可以表示为与模板融合模式对应的算子融合算法。In the operator fusion method provided according to the embodiment of the present disclosure, a template fusion mode of a linear structure and a corresponding operator fusion algorithm can be defined based on the category label of the operator, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of a neural network. This operator fusion process can be expressed as an operator fusion algorithm corresponding to the template fusion mode.
根据本公开的一些实施例,基于算子的类别标签设置线性结构的模板融合模式包括:基于算子的类别标签设置线性结构的第一模板融合模式和第二模板融合模式。在一些实现方式中,针对神经网络不仅需要定义一种模板融合模式,还可以定义多种模板融合模式,其中,在第一模板融合模式与第二模板融合模式存在相同部分的模板类别并且第一模板融合模式除相同部分的模板类别之外还包括其他模板类别的情况下,先按照第一模板融合模式对神经网络中的多个算子进行算子融合,再按照第二模板融合模式对神经网络中的多个算子进行算子融合。According to some embodiments of the present disclosure, the template fusion mode for setting a linear structure based on the category label of the operator includes: a first template fusion mode and a second template fusion mode for setting a linear structure based on the category label of the operator. In some implementations, not only one template fusion mode but also multiple template fusion modes need to be defined for the neural network, wherein, when the first template fusion mode and the second template fusion mode have the same part of the template category and the first template fusion mode includes other template categories in addition to the same part of the template category, the multiple operators in the neural network are first fused according to the first template fusion mode, and then the multiple operators in the neural network are fused according to the second template fusion mode.
图6示出了根据本公开一些实施例的模板融合模式的融合优先级示意图,
其中,示意性地示出了模板融合模式1、模板融合模式2和模板融合模式3。为便于描述,上述第一模板融合模式与第二模板融合模式存在相同部分的模板类别并且第一模板融合模式除相同部分的模板类别之外还包括其他模板类别的情形也可以表示为第一模板融合模式包含第二模板融合模式。参考图6,其示出了模板融合模式之间的包含关系,具体的可以表示为模板融合模式1包含模板融合模式2和模板融合模式3。例如,模板融合模式1与模板模式2存在相同部分的模板类别(表示为“Matrix-Normalize”)并且模板融合模式1还包括除该相同部分的模板类别之外的其他模板类别(表示“Softmax”),具体的,该相同部分的模板类别对应于模板融合模式2,并且模板融合模式2中不包括其他的模板类别。FIG6 shows a schematic diagram of fusion priority of a template fusion mode according to some embodiments of the present disclosure, Among them, template fusion mode 1, template fusion mode 2 and template fusion mode 3 are schematically shown. For the convenience of description, the situation in which the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories can also be expressed as the first template fusion mode containing the second template fusion mode. Refer to Figure 6, which shows the inclusion relationship between template fusion modes, specifically, it can be expressed as template fusion mode 1 including template fusion mode 2 and template fusion mode 3. For example, template fusion mode 1 and template mode 2 have the same part of template categories (expressed as "Matrix-Normalize") and template fusion mode 1 also includes other template categories (expressed as "Softmax") in addition to the template category of the same part. Specifically, the template category of the same part corresponds to template fusion mode 2, and template fusion mode 2 does not include other template categories.
此外,继续参考图6,模板融合模式2也可以表示包含模板融合模式3。即,模板融合模式2与模板模式3存在相同部分的模板类别(表示为“Matrix”)并且模板融合模式1还包括其他模板类别(表示“Normalize”),具体的,该相同部分的模板类别对应于模板融合模式3,并且模板融合模式3不包括其他的模板类别。针对图6所示的情形,对于具有包含关系的模板融合模式设置了融合优先级,具体的,板融合模式1先于模板融合模式2和模板融合模式3进行匹配融合,模板融合模式2先于模板融合模式3进行融合。也就是说,包括的模板类别个数越多优先级越高,这样的优先级设置可以使得尽可能多的算子融合成一个融合层。否则,在不设置以上优先级的情况下,在基于神经网络的有向无环图进行融合匹配时,可能直接将对符合模板融合模式3的算子进行融合,无法保证对于符合模板融合模式1的融合匹配。In addition, continuing to refer to Figure 6, template fusion mode 2 can also be represented as including template fusion mode 3. That is, template fusion mode 2 and template mode 3 have the same part of template categories (expressed as "Matrix") and template fusion mode 1 also includes other template categories (expressed as "Normalize"). Specifically, the same part of template categories corresponds to template fusion mode 3, and template fusion mode 3 does not include other template categories. For the situation shown in Figure 6, a fusion priority is set for the template fusion modes with an inclusion relationship. Specifically, template fusion mode 1 is matched and fused before template fusion mode 2 and template fusion mode 3, and template fusion mode 2 is fused before template fusion mode 3. In other words, the more template categories included, the higher the priority. Such a priority setting can make as many operators as possible merge into one fusion layer. Otherwise, without setting the above priority, when performing fusion matching based on a directed acyclic graph of a neural network, the operators that meet template fusion mode 3 may be directly fused, and the fusion matching that meets template fusion mode 1 cannot be guaranteed.
根据本公开的一些实施例,基于算子的类别标签设置线性结构的模板融合模式可以包括:利用可配置的文件设置线性结构的模板融合模式。作为示例,可配置的文件的格式包括json格式或者yaml格式。也就是说,在根据本公开实施例的算子融合方法中,能够基于可配置文件来进行融合模式的定义,例如,定义如图6所示的多种模板融合模式,此过程可以的动态且可配置的实现,以适配不同的神经网络结构以及硬件计算平台的调整,增加算子融合过程的灵活性。可以理解的是,可配置的文件格式并不限于以上列出的两种。在根据本公开的实施例中,模板融合模式是可配置的,例如通过文件的方式进行配置,多个模板融合模式可以组成一个集合,通过文件配置的方式输入到计算装置中,以用于进行神经网络的算子融合过程。
According to some embodiments of the present disclosure, setting a template fusion mode of a linear structure based on a category label of an operator may include: setting a template fusion mode of a linear structure using a configurable file. As an example, the format of the configurable file includes a json format or a yaml format. That is to say, in the operator fusion method according to an embodiment of the present disclosure, the fusion mode can be defined based on a configurable file, for example, a plurality of template fusion modes as shown in FIG6 are defined, and this process can be implemented dynamically and configurably to adapt to different neural network structures and adjustments to the hardware computing platform, thereby increasing the flexibility of the operator fusion process. It is understandable that the configurable file format is not limited to the two types listed above. In an embodiment of the present disclosure, the template fusion mode is configurable, for example, configured by means of a file, and a plurality of template fusion modes can form a set, which is input into a computing device by means of a file configuration, for use in the operator fusion process of the neural network.
在上文描述的线性结构的模板融合模式中,算子之间串联连接,这意味着在一个分支内进行算子融合即可。在一些应用场景中,神经网络的网络结构可能较为复杂,由此,除了线性结构的模板融合模式外,还可能还需要定义其他类型的、更为复杂的融合模式,例如,子图结构的融合模式。此种子图结构的融合模式需要跨多个并行分支进行算子融合以获得更高的计算效率。In the template fusion mode of the linear structure described above, operators are connected in series, which means that operator fusion can be performed in one branch. In some application scenarios, the network structure of the neural network may be more complex. Therefore, in addition to the template fusion mode of the linear structure, other types of more complex fusion modes may also need to be defined, such as the fusion mode of the subgraph structure. This fusion mode of the seed graph structure requires operator fusion across multiple parallel branches to achieve higher computational efficiency.
根据本公开的一些实施例提供的算子融合方法还可以包括:以算子为单位设置子图结构的融合模式。相比于以上以算子的类别为单位设置的线性结构的模板融合模式,在子图结构的融合模式是以算子为单位进行定义的,例如,该模式指示需要对具有特定连接关系的几个算子本身进行融合。The operator fusion method provided according to some embodiments of the present disclosure may also include: setting a fusion mode of a subgraph structure in units of operators. Compared with the above template fusion mode of a linear structure set in units of operator categories, the fusion mode of a subgraph structure is defined in units of operators, for example, the mode indicates that several operators with a specific connection relationship need to be fused.
类似地,基于子图匹配的算子融合过程也是基于神经网络的有向无环图来进行的。例如,可以把每个子图结构的融合模式都当作一个子图来看待、且采用与神经网络相同的图中间表达(Graph Intermediate representation,Graph IR)来描述,那么基于子图结构的融合模式的算子融合过程就是子图匹配的过程,即从神经网络的图表达(通常是DAG图)中匹配符合该Graph IR形式的子图,并对符合匹配条件的算子进行融合。Similarly, the operator fusion process based on subgraph matching is also based on the directed acyclic graph of the neural network. For example, each fusion mode of the subgraph structure can be regarded as a subgraph and described using the same graph intermediate representation (Graph IR) as the neural network. Then the operator fusion process based on the fusion mode of the subgraph structure is the subgraph matching process, that is, matching the subgraph that conforms to the Graph IR form from the graph representation of the neural network (usually a DAG graph) and fusing the operators that meet the matching conditions.
根据本公开的一些实施例,在按照线性结构的模板融合模式对神经网络中的多个算子进行算子融合之前,算子融合方法还可以包括:按照子图结构的融合模式对神经网络的有向无环图中的算子进行算子融合,得到经子图结构算子融合后的图表达。在这些实施例中,按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合包括:根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式,对经子图结构算子融合后的图表达中的算子进行遍历,分别将一个或多个算子匹配为对应的线性结构的模板融合模式,对神经网络中的多个算子进行算子融合。其中,在算子融合过程中忽略神经网络中属于非模板类别的算子。According to some embodiments of the present disclosure, before performing operator fusion on multiple operators in a neural network according to a template fusion mode of a linear structure, the operator fusion method may further include: performing operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure. In these embodiments, performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure includes: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on multiple operators in the neural network. In which, operators belonging to non-template categories in the neural network are ignored during the operator fusion process.
基于子图匹配的算子融合和基于线性模板融合模式进行算子融合可以是两个相互独立的融合过程,同时在功能上又互为补充,前者负责复杂结构的计算模式匹配,后者负责线性结构的计算模式匹配。在神经网络的整个算子融合过程中,可以首先执行基于子图匹配的算子融合,然后再执行基于线性模板融合模式的算子融合过程,并且后者的执行可以是基于前者的融合结果,并可以在此基础上做进一步的融合。Operator fusion based on subgraph matching and operator fusion based on linear template fusion mode can be two independent fusion processes, and at the same time, they complement each other in function. The former is responsible for the calculation mode matching of complex structures, and the latter is responsible for the calculation mode matching of linear structures. In the entire operator fusion process of the neural network, the operator fusion based on subgraph matching can be executed first, and then the operator fusion process based on the linear template fusion mode can be executed. The execution of the latter can be based on the fusion result of the former, and further fusion can be performed on this basis.
图7示出了根据本公开一些实施例的算子融合方案示意框图,图8A-图
8B示出了根据本公开一些实施例的算子融合方案流程示意图,以下将结合图7、图8A和图8B对根据本公开一些实施例的算子融合方案进行整体描述。FIG. 7 shows a schematic block diagram of an operator fusion solution according to some embodiments of the present disclosure, and FIG. 8A- 8B shows a schematic flow chart of an operator fusion scheme according to some embodiments of the present disclosure. The operator fusion scheme according to some embodiments of the present disclosure will be described as a whole in conjunction with FIG. 7 , FIG. 8A and FIG. 8B .
如图7所示,算子融合方案可以分为两个部分:定义融合模式以及基于定义的融合模式执行融合算法,从而得到经过算子融合的神经网络图表达。具体的,在定义融合模式阶段,可以利用可配置的文件(例如json格式或者yaml格式)来设置线性结构的模板融合模式以及子图结构的融合模式,本公开对融合模式的个数以及具体的模式形式不作限制。As shown in Figure 7, the operator fusion scheme can be divided into two parts: defining the fusion mode and executing the fusion algorithm based on the defined fusion mode, so as to obtain the neural network graph expression after operator fusion. Specifically, in the stage of defining the fusion mode, a configurable file (such as json format or yaml format) can be used to set the template fusion mode of the linear structure and the fusion mode of the subgraph structure. The present disclosure does not limit the number of fusion modes and the specific mode form.
在执行融合算法阶段,针对神经网络的有向无环图DAG,首先按照定义的子图结构的融合模式进行基于子图的算子融合过程,例如通过严格的子图匹配来实现。在经过此步骤后得到经子图结构算子融合后的图表达。接着,根据算子各自的类别标签,按照线性结构的模板融合模式中线性连接的一个或多个模板类别,对经子图结构算子融合后的图表达中的算子进行算子融合,并且在算子融合过程中忽略神经网络中属于非模板类别的算子,得到经线性算子融合后的图表达。In the stage of executing the fusion algorithm, for the directed acyclic graph DAG of the neural network, the subgraph-based operator fusion process is first performed according to the fusion mode of the defined subgraph structure, for example, through strict subgraph matching. After this step, the graph expression after the subgraph structure operator fusion is obtained. Then, according to the category labels of the respective operators, according to one or more template categories linearly connected in the template fusion mode of the linear structure, the operators in the graph expression after the subgraph structure operator fusion are fused, and in the operator fusion process, the operators belonging to the non-template category in the neural network are ignored to obtain the graph expression after the linear operator fusion.
作为示例,如图8A所示,对于示出的子图结构的融合模式(包括算子6、算子7、算子8和算子9),首先在DAG中进行严格的子图匹配以找到符合该模式的算子,并对其进行算子融合,得到经子图结构算子融合后的图表达,其中,以椭圆形P表示经融合后的融合层,在图8B中示出为融合层P。关于此融合层P所对应的模板类别,可以采用直接定义的方式,例如,对于进行基于子图匹配的算子融合,可以定义唯一的类别标签,例如类别X,以与图3中示出的诸如Matrix、Normalize等类别的算子区分开,以用于在后续的线性的模板匹配过程中进行算子融合。接着,如图8B所示,对于经子图结构算子融合后的图表达,将继续进行基于线性结构的模板融合模式的算子融合过程,例如,对于其中符合模板的算子进行融合,分别表示为融合层A、融合层B、融合层C、融合层D以及融合层E,并最终得到经线性算子融合后的图表达。As an example, as shown in FIG8A , for the fusion mode of the subgraph structure shown (including operator 6, operator 7, operator 8, and operator 9), firstly, strict subgraph matching is performed in the DAG to find operators that meet the mode, and the operators are fused to obtain a graph expression after the subgraph structure operator fusion, wherein the fused layer after fusion is represented by an ellipse P, and is shown as fusion layer P in FIG8B . As for the template category corresponding to this fusion layer P, a direct definition method can be adopted. For example, for operator fusion based on subgraph matching, a unique category label, such as category X, can be defined to distinguish it from operators of categories such as Matrix, Normalize, etc. shown in FIG3 , so as to be used for operator fusion in the subsequent linear template matching process. Next, as shown in Figure 8B, for the graph expression after the sub-graph structure operator fusion, the operator fusion process based on the linear structure template fusion mode will continue. For example, the operators that meet the template are fused and represented as fusion layer A, fusion layer B, fusion layer C, fusion layer D and fusion layer E respectively, and finally the graph expression after linear operator fusion is obtained.
利用本公开的一些实施例的算子融合方法,能够针对各种类型的神经网络结构进行算子融合,提供了基于算子的类别标签设置的线性结构的模板融合模式,以及对应的算子融合算法,以使得设计的模板融合模式具有通用性以及可扩展性并能够以算子的类别为单位进行融合,从而提供高效的神经网络的算子融合过程,有利于神经网络的图优化,提高神经网络在硬件平台的
计算效率。By using the operator fusion method of some embodiments of the present disclosure, it is possible to perform operator fusion for various types of neural network structures, and provide a template fusion mode of a linear structure set based on the category label of the operator, and a corresponding operator fusion algorithm, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the performance of the neural network on the hardware platform. Computational efficiency.
根据本公开的另一方面,还提供了一种计算装置,用于对神经网络进行算子融合,其中,神经网络包括多个算子。根据本公开实施例的计算装置能够应用于神经网络推理或训练的高效计算领域,解决神经网络的图优化过程中的算子融合问题,以融合模式的通用性和可扩展性作为目标,提供融合模式可变且融合算法固定的算子融合策略。具体的,根据本公开的一些实施例的计算装置能够针对各种类型的神经网络结构进行算子融合,设计了基于算子的类别标签设置的线性结构的模板融合模式,以及对应的算子融合算法,以使得设计的模板融合模式具有通用性以及可扩展性并能够以算子的类别为单位进行融合,从而提供高效的神经网络的算子融合过程,有利于神经网络的图优化,提高神经网络在硬件平台的计算效率。According to another aspect of the present disclosure, a computing device is also provided for performing operator fusion on a neural network, wherein the neural network includes a plurality of operators. The computing device according to the embodiment of the present disclosure can be applied to the field of efficient computing for neural network reasoning or training, solve the operator fusion problem in the graph optimization process of the neural network, and provide an operator fusion strategy with a variable fusion mode and a fixed fusion algorithm with the universality and scalability of the fusion mode as the goal. Specifically, the computing device according to some embodiments of the present disclosure can perform operator fusion on various types of neural network structures, and designs a template fusion mode of a linear structure set based on the category label of the operator, as well as a corresponding operator fusion algorithm, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform.
图9示出了根据本公开一些实施例的计算装置的示意性框图。如图9所示,根据本公开实施例的计算中装置1000包括:融合模式配置单元1010以及融合单元1020。Fig. 9 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure. As shown in Fig. 9 , a computing device 1000 according to an embodiment of the present disclosure includes: a fusion mode configuration unit 1010 and a fusion unit 1020 .
具体的,融合模式配置单元1010可以配置成基于算子的类别标签设置至少一个线性结构的模板融合模式。可以理解的是,本公开中的术语“至少一个”可以理解为一个或多于一个,即一个、两个或者更多个,在此不作限制,下文不再对模板融合模式的数目进行限定,可以类似地解释为一个或多于一个模板融合模式。Specifically, the fusion mode configuration unit 1010 can be configured to set at least one linear structure template fusion mode based on the category label of the operator. It can be understood that the term "at least one" in the present disclosure can be understood as one or more than one, that is, one, two or more, and is not limited here. The number of template fusion modes is no longer limited below, and can be similarly interpreted as one or more than one template fusion modes.
在根据本公开的实施例中,定义了一种模板型的融合模式,并且是针对算子的类别标签来进行定义的,相比较于如图1B所示的以算子为单位的融合模式具有更高是适应性地,只要是符合该类别的算子均可以得到融合,这可以显著降低需要设计的融合模式的数目。In an embodiment according to the present disclosure, a template-type fusion mode is defined, and is defined for the category label of the operator. Compared with the fusion mode based on operators as shown in FIG1B , it has higher adaptability. As long as the operators conform to the category, they can be fused, which can significantly reduce the number of fusion modes that need to be designed.
融合单元1020可以配置成根据神经网络中的多个算子各自的类别标签,按照线性结构的模板融合模式对神经网络中的多个算子进行算子融合,以融合神经网络中符合线性结构的模板融合模式的一个或多个算子。The fusion unit 1020 can be configured to perform operator fusion on multiple operators in the neural network according to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the linear structure.
根据本公开的一些实施例的计算装置还包括分类单元1030,配置成:基于算子的功能和/或硬件平台的计算架构特点对算子进行分类,并针对不同类别的算子分配类别标签。可以理解的是,本公开中的术语“和/或”表示三种情形,即基于算子的功能、基于硬件平台的计算架构特点、基于算子的功能和硬件平台的计算架构特点。根据本公开的一些实施例,类别标签包括非模
板类别以及模板类别,模板类别包括以下的一种或多种:矩阵类别、归一化类别、池化类别、数据重排类别、数据归约类别、回归函数类别以及损失函数类别。According to some embodiments of the present disclosure, the computing device further includes a classification unit 1030, which is configured to classify operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assign category labels to operators of different categories. It is understood that the term "and/or" in the present disclosure represents three situations, namely, based on the functions of the operators, based on the computing architecture characteristics of the hardware platform, and based on the functions of the operators and the computing architecture characteristics of the hardware platform. According to some embodiments of the present disclosure, the category labels include non-modal Plate categories and template categories, the template category includes one or more of the following: matrix category, normalization category, pooling category, data rearrangement category, data reduction category, regression function category and loss function category.
为了定义模板融合模式,首先要对神经网络所需的各种算子进行分类,将众多的算子分为若干个的类别。一般地,算子的分类需要结合其功能和硬件平台的架构特点,例如,不同类别的计算功能通常由不同的计算单元执行,如矩阵运算、矢量运算、标量运算、特殊功能单元等等,这样的多个算子融合到一起,可以进行流水线并行,而同类计算功能的多个算子如果不需要消耗额外的寄存器资源或者同步资源,可以融合到一起,在计算单元中串行执行、避免数据转存和内存操作的时间消耗。In order to define the template fusion mode, we must first classify the various operators required by the neural network and divide the numerous operators into several categories. Generally, the classification of operators needs to be combined with their functions and the architectural characteristics of the hardware platform. For example, different categories of computing functions are usually performed by different computing units, such as matrix operations, vector operations, scalar operations, special function units, etc. Such multiple operators can be fused together for pipeline parallelism, and multiple operators of the same computing function can be fused together if they do not need to consume additional register resources or synchronization resources. They can be executed serially in the computing unit to avoid time consumption of data transfer and memory operations.
根据本公开的一些实施例的计算装置,其中,属于非模板类别的算子包括:激活函数、线性整流函数、绝对值函数、加法函数。具体的,属于非模板类别的算子在神经网络的计算过程中不消耗寄存器资源或者同步内存资源。对于属于非模板类别的算子,在计算过程中,通过一个样点经过计算还是一个样点,不会占用额外资源,由此,在本公开的一些实施例中,单独地设置该类别的算子,以在模板匹配时不计入此类别的算子。如图3所示,属于非模板类别的算子例如可以是Element Wise算子。作为示例,Element Wise算子可以包括激活函数(Sigmoid、Swish)、线性整流函数(Relu)、绝对值函数(Abs)、加法函数(Add)等,在此不再一一列举。以上属于非模板类别的算子在神经网络的结构中是普遍存在的。例如,上述线性结构的模板融合模式由直线型连接的一个或多个模板类别组成,也就是说,在根据本公开实施例的模板融合模式中仅包括模板类别的算子而不包括以上属于非模板类别的算子。由此,在根据本公开实施例设计的模板融合模式中消除了此类不会占用额外计算资源的算子对于融合模式的影响,这有利于进一步降低模板融合模式的复杂度,并且更具有普遍适用性。According to the computing device of some embodiments of the present disclosure, the operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, and addition function. Specifically, the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network. For the operators belonging to the non-template category, during the calculation process, a sample point is still a sample point after calculation, and no additional resources are occupied. Therefore, in some embodiments of the present disclosure, the operators of this category are set separately so as not to include the operators of this category in the template matching. As shown in FIG3, the operators belonging to the non-template category may be, for example, Element Wise operators. As an example, the Element Wise operator may include activation functions (Sigmoid, Swish), linear rectification functions (Relu), absolute value functions (Abs), addition functions (Add), etc., which are not listed one by one here. The above operators belonging to the non-template category are ubiquitous in the structure of the neural network. For example, the template fusion mode of the above linear structure is composed of one or more template categories connected in a straight line, that is, the template fusion mode according to the embodiment of the present disclosure only includes operators of the template category but does not include the above operators belonging to the non-template category. Therefore, in the template fusion mode designed according to the embodiment of the present disclosure, the influence of such operators that do not occupy additional computing resources on the fusion mode is eliminated, which is conducive to further reducing the complexity of the template fusion mode and making it more universally applicable.
根据本公开的一些实施例,属于矩阵类别的算子包括:前向卷积算子、反向数据卷积算子、反向滤波器卷积算子、矩阵乘法算子;属于归一化类别的算子包括:批归一化算子、层归一化算子;属于池化类别的算子包括:最大池化层算子、平均池化层算子、全局平均池化层算子;属于数据重排类别的算子包括:拼接算子、变换算子;属于数据归约类别的算子包括:最大值函数、最小值函数、平均值函数;属于回归函数类别的算子包括:针对样点
的回归函数、针对信道的回归函数;以及属于损失函数类别的算子包括:均方误差函数、交叉熵函数。According to some embodiments of the present disclosure, operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator; operators belonging to the normalization category include: batch normalization operator, layer normalization operator; operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator; operators belonging to the data rearrangement category include: splicing operator, transformation operator; operators belonging to the data reduction category include: maximum value function, minimum value function, average value function; operators belonging to the regression function category include: for sample points The regression function for the channel; and the operators belonging to the loss function category include: mean square error function, cross entropy function.
可以理解的是,在根据本公开的实施例中,首先对神经网络的算子进行了分类,然后再以类别为单位来定义模板融合模式,而非以算子本身进行融合,这有利于减少神经网络所需的融合模式的数目(这将在下文体现出来)。此外,在以上以类别为单位定义的模板融合模式中,还剔除了不会占用额外计算资源的非模板类别的算子对于融合模式的影响,即,在定义的模板融合模式中不包括此类算子,这有利于进一步降低模板融合模式的复杂度,并且更具有普遍适用性。It is understandable that in the embodiment according to the present disclosure, the operators of the neural network are first classified, and then the template fusion mode is defined in units of categories, rather than fusion by the operators themselves, which is conducive to reducing the number of fusion modes required for the neural network (this will be reflected below). In addition, in the template fusion mode defined in units of categories above, the influence of operators of non-template categories that do not occupy additional computing resources on the fusion mode is also eliminated, that is, such operators are not included in the defined template fusion mode, which is conducive to further reducing the complexity of the template fusion mode and is more generally applicable.
根据本公开的一些实施例,至少一个线性结构的模板融合模式由线性连接的一个或多个模板类别组成,基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于模板类别的类别标签且忽略非模板类别的类别标签,以设置至少一个线性结构的模板融合模式。作为示例,图4示出了根据本公开一些实施例的模板融合模式的示意图,具体结构可以参考以上结合图4进行的描述。According to some embodiments of the present disclosure, the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, and setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category. As an example, FIG4 shows a schematic diagram of the template fusion mode according to some embodiments of the present disclosure, and the specific structure can refer to the description made in conjunction with FIG4 above.
根据本公开的一些实施例,如图9所示,该计算装置还包括生成单元1040,其被配置成:基于神经网络的网络结构生成有向无环图,其中,有向无环图包括算子以及算子之间的连线,算子之间的连线表征算子之间的数据依赖关系和数据流向。有向无环图的示例可以参考图5A,在此不再重复。According to some embodiments of the present disclosure, as shown in FIG9 , the computing device further includes a generating unit 1040, which is configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and lines between operators, and the lines between operators represent data dependencies and data flows between operators. An example of a directed acyclic graph can be referred to FIG5A , which will not be repeated here.
根据本公开的一些实施例,融合单元1020按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合包括:根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式,对神经网络的有向无环图中的算子进行遍历,分别将一个或多个算子匹配为对应的线性结构的模板融合模式,对神经网络中的多个算子进行算子融合。According to some embodiments of the present disclosure, the fusion unit 1020 performs operator fusion on multiple operators in the neural network according to the template fusion mode of at least one linear structure, including: according to the respective category labels of the multiple operators in the neural network, according to the template fusion mode of at least one linear structure, traversing the operators in the directed acyclic graph of the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, respectively, and performing operator fusion on the multiple operators in the neural network.
根据本公开的一些实施例,融合模式配置单元1010基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于算子的类别标签设置线性结构的第一模板融合模式和第二模板融合模式,其中,在第一模板融合模式与第二模板融合模式存在相同部分的模板类别并且第一模板融合模式除相同部分的模板类别之外还包括其他模板类别的情况下,融合单元1020先按照第一模板融合模式对神经网络中的多个算子进行算子融合,再按照第二模板融合模式对神经网络中的多个算子进行算子融合。
According to some embodiments of the present disclosure, the fusion mode configuration unit 1010 sets at least one template fusion mode of a linear structure based on the category label of the operator, including: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, the fusion unit 1020 first performs operator fusion on multiple operators in the neural network according to the first template fusion mode, and then performs operator fusion on multiple operators in the neural network according to the second template fusion mode.
在根据本公开实施例提供的算子融合方法中,基于算子的类别标签来定义线性结构的模板融合模式,以及对应的算子融合算法,以使得设计的模板融合模式具有通用性以及可扩展性并能够以算子的类别为单位进行融合,从而提供高效的神经网络的算子融合过程。这种算子融合的过程可以表示为与模板融合模式对应的算子融合算法。In the operator fusion method provided according to the embodiment of the present disclosure, a template fusion mode of a linear structure and a corresponding operator fusion algorithm are defined based on the category label of the operator, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of a neural network. This operator fusion process can be expressed as an operator fusion algorithm corresponding to the template fusion mode.
根据本公开的一些实施例,基于算子的类别标签设置线性结构的模板融合模式包括:基于算子的类别标签设置线性结构的第一模板融合模式和第二模板融合模式。在一些实现方式中,针对神经网络不仅需要定义一种模板融合模式,还可以定义多种模板融合模式,其中,在第一模板融合模式与第二模板融合模式存在相同部分的模板类别并且第一模板融合模式除相同部分的模板类别之外还包括其他模板类别的情况下,先按照第一模板融合模式对神经网络中的多个算子进行算子融合,再按照第二模板融合模式对神经网络中的多个算子进行算子融合。According to some embodiments of the present disclosure, the template fusion mode for setting a linear structure based on the category label of the operator includes: a first template fusion mode and a second template fusion mode for setting a linear structure based on the category label of the operator. In some implementations, not only one template fusion mode but also multiple template fusion modes need to be defined for the neural network, wherein, when the first template fusion mode and the second template fusion mode have the same part of the template category and the first template fusion mode includes other template categories in addition to the same part of the template category, the multiple operators in the neural network are first fused according to the first template fusion mode, and then the multiple operators in the neural network are fused according to the second template fusion mode.
根据本公开的一些实施例,融合模式配置单元1010基于算子的类别标签设置线性结构的模板融合模式包括:利用可配置的文件设置线性结构的模板融合模式。According to some embodiments of the present disclosure, the fusion mode configuration unit 1010 sets the template fusion mode of the linear structure based on the category label of the operator, including: setting the template fusion mode of the linear structure by using a configurable file.
根据本公开的一些实施例,融合模式配置单元1010还配置成:以算子为单位设置子图结构的融合模式。根据本公开的一些实施例的计算装置还包括生成单元1040,配置成:基于神经网络的网络结构生成有向无环图,其中,有向无环图包括算子以及算子之间的连线,算子之间的连线表征算子之间的数据依赖关系和数据流向。According to some embodiments of the present disclosure, the fusion mode configuration unit 1010 is further configured to: set the fusion mode of the subgraph structure in units of operators. According to some embodiments of the present disclosure, the computing device further includes a generation unit 1040, configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
根据本公开的一些实施例,在按照至少一个线性结构的模板融合模式对神经网络中的多个算子进行算子融合之前,融合单元1020还配置成:按照子图结构的融合模式对神经网络的有向无环图中的算子进行算子融合,得到经子图结构算子融合后的图表达,其中,融合单元1020按照线性结构的模板融合模式对神经网络中的多个算子进行算子融合包括:根据神经网络中的多个算子各自的类别标签,按照至少一个线性结构的模板融合模式,对经子图结构算子融合后的图表达中的算子进行遍历,分别将一个或多个算子匹配为对应的线性结构的模板融合模式,对图表达中的算子进行算子融合。According to some embodiments of the present disclosure, before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, the fusion unit 1020 is further configured to: perform operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure, wherein the fusion unit 1020 performs operator fusion on multiple operators in the neural network according to the template fusion mode of the linear structure, including: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.
关于根据本公开实施例的计算装置进行算子融合所涉及的具体实现过程,可以参照以上结合图2-图8B描述的根据本公开一些实施例的算子融合方法,
在此不再重复描述。利用本公开实施例的计算装置能够进行类似地算子融合过程并实现相似地技术效果。Regarding the specific implementation process involved in the operator fusion performed by the computing device according to the embodiment of the present disclosure, reference may be made to the operator fusion method according to some embodiments of the present disclosure described above in conjunction with FIG. 2 to FIG. 8B . The description is not repeated here. The computing device using the embodiment of the present disclosure can perform a similar operator fusion process and achieve similar technical effects.
根据本公开的又一方面,还提供了一种计算设备。图10示出了根据本公开实施例的计算设备的示意性框图。According to another aspect of the present disclosure, a computing device is also provided. Fig. 10 shows a schematic block diagram of a computing device according to an embodiment of the present disclosure.
如图10所示,计算设备2000可以包括处理器2010以及存储器2020。根据本公开实施例,存储器2020中存储有计算机可读代码,该计算机可读代码当由处理器2010运行时,可以执行如上所述的算子融合方法。As shown in Fig. 10, the computing device 2000 may include a processor 2010 and a memory 2020. According to an embodiment of the present disclosure, the memory 2020 stores a computer-readable code, and when the computer-readable code is executed by the processor 2010, the operator fusion method described above may be executed.
处理器2010可以根据存储在存储器2020中的程序执行各种动作和处理。具体地,处理器2010可以是一种集成电路,具有信号处理能力。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。例如,此处的处理器可以是指能够进行神经网络计算的计算设备。The processor 2010 can perform various actions and processes according to the program stored in the memory 2020. Specifically, the processor 2010 can be an integrated circuit with signal processing capabilities. A general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc. For example, the processor here can refer to a computing device capable of performing neural network calculations.
存储器2020存储有计算机可执行指令代码,该指令代码在被处理器2010执行时用于实现根据本公开实施例的算子融合方法。存储器2020可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。应注意,本公开描述的存储器可以是任何适合类型的存储器。作为示例,通过执行存储器2020中的计算机可执行指令代码,诸如CPU的处理器能够实现用于神经网络层间同步的算子融合方法。The memory 2020 stores computer executable instruction codes, which are used to implement the operator fusion method according to the embodiment of the present disclosure when executed by the processor 2010. The memory 2020 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. It should be noted that the memory described in the present disclosure may be any suitable type of memory. As an example, by executing the computer executable instruction codes in the memory 2020, a processor such as a CPU can implement an operator fusion method for synchronization between neural network layers.
根据本公开实施例的算子融合方法或计算设备也可以借助于图11所示的计算设备3000的架构来实现。如图11所示,计算设备3000可以包括总线3010、一个或多个CPU 3020、只读存储器(ROM)3030、随机存取存储器(RAM)3040、连接到网络的通信端口3050、输入/输出组件3060、硬盘3070等。计算设备3000中的存储设备,例如ROM 3030或硬盘3070可以存储本公开提供的算子融合方法的处理和/或通信使用的各种数据或文件以及CPU所执行的程序指令。计算设备3000还可以包括用户界面3080。当然,图11所示的架构只是示例性的,在实现不同的设备时,根据实际需要,可以省略图11示出的计算设备中的一个或多个组件。The operator fusion method or computing device according to the embodiment of the present disclosure can also be implemented with the help of the architecture of the computing device 3000 shown in FIG11. As shown in FIG11, the computing device 3000 may include a bus 3010, one or more CPUs 3020, a read-only memory (ROM) 3030, a random access memory (RAM) 3040, a communication port 3050 connected to a network, an input/output component 3060, a hard disk 3070, etc. The storage device in the computing device 3000, such as the ROM 3030 or the hard disk 3070, can store various data or files used for processing and/or communication of the operator fusion method provided by the present disclosure and program instructions executed by the CPU. The computing device 3000 may also include a user interface 3080. Of course, the architecture shown in FIG11 is only exemplary. When implementing different devices, one or more components in the computing device shown in FIG11 may be omitted according to actual needs.
根据本公开的又一方面,还提供了一种非暂时性计算机可读存储介质。图12示出了根据本公开的存储介质的示意图4000。According to yet another aspect of the present disclosure, a non-transitory computer-readable storage medium is also provided. Fig. 12 shows a schematic diagram 4000 of a storage medium according to the present disclosure.
如图12所示,计算机存储介质4020上存储有计算机可读指令4010。当计算机可读指令4010由处理器运行时,可以执行参照以上附图描述的算子融合方法。计算机可读存储介质包括但不限于例如易失性存储器和/或非易
失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。例如,计算机存储介质4020可以连接于诸如计算机等的计算设备,接着,在计算设备运行计算机存储介质4020上存储的计算机可读指令4010的情况下,可以进行如上所描述的根据本公开实施例提供的算子融合方法。As shown in FIG. 12 , a computer storage medium 4020 stores computer readable instructions 4010. When the computer readable instructions 4010 are executed by a processor, the operator fusion method described with reference to the above figures may be executed. The computer readable storage medium includes, but is not limited to, volatile memory and/or non-volatile memory. Volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. For example, the computer storage medium 4020 may be connected to a computing device such as a computer, and then, when the computing device runs the computer-readable instructions 4010 stored on the computer storage medium 4020, the operator fusion method provided according to the embodiment of the present disclosure as described above may be performed.
综上,本公开的一些实施例提供了一种算子融合方法、计算装置、计算设备和存储介质,用于针对各种类型的神经网络结构,尤其是网络结构较为复杂的神经网络提供算子融合解决方案,更具体地,用于针对各种类型的神经网络结构进行算子融合,设计了基于算子的类别标签设置的线性结构的模板融合模式,以及对应的算子融合算法,以使得设计的模板融合模式具有通用性以及可扩展性并能够以算子的类别为单位进行融合,从而提供高效的神经网络的算子融合过程,有利于神经网络的图优化,提高神经网络在硬件平台的计算效率。In summary, some embodiments of the present disclosure provide an operator fusion method, a computing device, a computing equipment and a storage medium, which are used to provide operator fusion solutions for various types of neural network structures, especially neural networks with more complex network structures. More specifically, for performing operator fusion on various types of neural network structures, a template fusion mode of a linear structure set based on operator category labels and a corresponding operator fusion algorithm are designed to make the designed template fusion mode universal and scalable and able to be fused in units of operator categories, thereby providing an efficient operator fusion process for the neural network, which is beneficial to the graph optimization of the neural network and improving the computing efficiency of the neural network on the hardware platform.
本领域技术人员能够理解,本公开所披露的内容可以出现多种变型和改进。例如,以上所描述的各种设备或组件可以通过硬件实现,也可以通过软件、固件、或者三者中的一些或全部的组合实现。Those skilled in the art will appreciate that the contents disclosed in this disclosure may be subject to various variations and improvements. For example, the various devices or components described above may be implemented by hardware, or by software, firmware, or a combination of some or all of the three.
此外,虽然本公开对根据本公开的实施例的系统中的某些单元做出了各种引用,然而,任何数量的不同单元可以被使用并运行在客户端和/或服务器上。单元仅是说明性的,并且系统和方法的不同方面可以使用不同单元。In addition, although the present disclosure makes various references to certain units in the system according to embodiments of the present disclosure, any number of different units may be used and run on the client and/or server. The units are illustrative only, and different aspects of the systems and methods may use different units.
本公开中使用了流程图用来说明根据本公开的实施例的方法的步骤。应当理解的是,前面或后面的步骤不一定按照顺序来精确的进行。相反,可以按照倒序或同时处理各种步骤。同时,也可以将其他操作添加到这些过程中。Flowcharts are used in this disclosure to illustrate the steps of the method according to the embodiments of the present disclosure. It should be understood that the preceding or following steps are not necessarily performed precisely in order. On the contrary, various steps may be processed in reverse order or simultaneously. At the same time, other operations may also be added to these processes.
本领域普通技术人员可以理解上述方法中的全部或部分的步骤可通过计算机程序来指令相关硬件完成,程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本公开并不限制于任何特定形式的硬件和软件的结合。Those skilled in the art will appreciate that all or part of the steps in the above method can be completed by instructing related hardware through a computer program, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk. Optionally, all or part of the steps in the above embodiment can also be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment can be implemented in the form of hardware or in the form of a software functional module. The present disclosure is not limited to any particular form of combination of hardware and software.
除非另有定义,这里使用的所有术语具有与本公开所属领域的普通技术人员共同理解的相同含义。还应当理解,诸如在通常字典里定义的那些术语
应当被解释为具有与它们在相关技术的上下文中的含义相一致的含义,而不应用理想化或极度形式化的意义来解释,除非这里明确地这样定义。Unless otherwise defined, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure belongs. It should also be understood that terms such as those defined in common dictionaries They should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and should not be interpreted in an idealized or extremely formal sense, unless expressly defined as such herein.
以上是对本公开的说明,而不应被认为是对其的限制。尽管描述了本公开的若干示例性实施例,但本领域技术人员将容易地理解,在不背离本公开的新颖教学和优点的前提下可以对示例性实施例进行许多修改。因此,所有这些修改都意图包含在权利要求书所限定的本公开范围内。应当理解,上面是对本公开的说明,而不应被认为是限于所公开的特定实施例,并且对所公开的实施例以及其他实施例的修改意图包含在所附权利要求书的范围内。本公开由权利要求书及其等效物限定。
The above is an explanation of the present disclosure and should not be considered as a limitation thereof. Although several exemplary embodiments of the present disclosure are described, it will be readily understood by those skilled in the art that many modifications may be made to the exemplary embodiments without departing from the novel teachings and advantages of the present disclosure. Therefore, all such modifications are intended to be included within the scope of the present disclosure as defined in the claims. It should be understood that the above is an explanation of the present disclosure and should not be considered to be limited to the specific embodiments disclosed, and modifications to the disclosed embodiments and other embodiments are intended to be included within the scope of the appended claims. The present disclosure is defined by the claims and their equivalents.
Claims (20)
- 一种算子融合方法,应用于包括多个算子的神经网络,所述方法包括:An operator fusion method is applied to a neural network including multiple operators, the method comprising:基于算子的类别标签设置至少一个线性结构的模板融合模式;以及Setting a template fusion mode of at least one linear structure based on the category label of the operator; and根据所述神经网络中的所述多个算子各自的类别标签,按照所述至少一个线性结构的模板融合模式对所述神经网络中的所述多个算子进行算子融合,以融合所述神经网络中符合所述至少一个线性结构的模板融合模式的一个或多个算子。According to the respective category labels of the multiple operators in the neural network, the multiple operators in the neural network are subjected to operator fusion according to the template fusion mode of the at least one linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the at least one linear structure.
- 根据权利要求1所述的算子融合方法,还包括:基于算子的功能和/或硬件平台的计算架构特点对算子进行分类,并针对不同类别的算子分配类别标签。The operator fusion method according to claim 1 further includes: classifying operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assigning category labels to operators of different categories.
- 根据权利要求1或2所述的算子融合方法,其中,所述类别标签包括非模板类别以及模板类别,所述模板类别包括以下的一种或多种:The operator fusion method according to claim 1 or 2, wherein the category label includes a non-template category and a template category, and the template category includes one or more of the following:矩阵类别、归一化类别、池化类别、数据重排类别、数据归约类别、回归函数类别以及损失函数类别。Matrix category, normalization category, pooling category, data shuffling category, data reduction category, regression function category, and loss function category.
- 根据权利要求3所述的算子融合方法,其中,属于所述非模板类别的算子在神经网络的计算过程中不消耗寄存器资源或者同步内存资源。The operator fusion method according to claim 3, wherein the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
- 根据权利要求3或4所述的算子融合方法,其中,所述至少一个线性结构的模板融合模式由线性连接的一个或多个模板类别组成,所述基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于所述模板类别的类别标签且忽略所述非模板类别的类别标签,以设置所述至少一个线性结构的模板融合模式。According to the operator fusion method according to claim 3 or 4, the template fusion mode of the at least one linear structure is composed of one or more template categories that are linearly connected, and the template fusion mode of the at least one linear structure is set based on the category label of the operator, including: setting the template fusion mode of the at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
- 根据权利要求5所述的算子融合方法,还包括:基于所述神经网络的网络结构生成有向无环图,其中,所述有向无环图包括算子以及算子之间的连线,所述算子之间的连线表征算子之间的数据依赖关系和数据流向,The operator fusion method according to claim 5, further comprising: generating a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators,其中,所述按照所述至少一个线性结构的模板融合模式对所述神经网络 中的所述多个算子进行算子融合包括:Wherein, the template fusion mode according to the at least one linear structure is used to fusion the neural network The multiple operators in the above method are subjected to operator fusion, comprising:根据所述神经网络中的所述多个算子各自的类别标签,按照所述至少一个线性结构的模板融合模式,对所述神经网络的所述有向无环图中的算子进行遍历,分别将一个或多个算子匹配为对应的所述线性结构的模板融合模式,对所述神经网络中的所述多个算子进行算子融合。According to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the at least one linear structure, the operators in the directed acyclic graph of the neural network are traversed, one or more operators are matched to the corresponding template fusion mode of the linear structure, and operator fusion is performed on the multiple operators in the neural network.
- 根据权利要求5或6所述的算子融合方法,其中,所述基于算子的类别标签设置至少一个线性结构的模板融合模式还包括:基于算子的类别标签设置线性结构的第一模板融合模式和第二模板融合模式,其中,在所述第一模板融合模式与所述第二模板融合模式存在相同部分的模板类别并且所述第一模板融合模式除所述相同部分的模板类别之外还包括其他模板类别的情况下,先按照所述第一模板融合模式对所述神经网络中的所述多个算子进行算子融合,再按照所述第二模板融合模式对所述神经网络中的所述多个算子进行算子融合。According to the operator fusion method according to claim 5 or 6, wherein the template fusion mode for setting at least one linear structure based on the category label of the operator also includes: a first template fusion mode and a second template fusion mode for setting the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, the multiple operators in the neural network are first fused according to the first template fusion mode, and then the multiple operators in the neural network are fused according to the second template fusion mode.
- 根据权利要求1所述的算子融合方法,其中,所述基于算子的类别标签设置至少一个线性结构的模板融合模式包括:The operator fusion method according to claim 1, wherein the template fusion mode of setting at least one linear structure based on the category label of the operator comprises:利用可配置的文件设置所述至少一个线性结构的模板融合模式。The template fusion mode of the at least one linear structure is set using a configurable file.
- 根据权利要求1所述的算子融合方法,还包括:The operator fusion method according to claim 1, further comprising:以算子为单位设置子图结构的融合模式;以及Setting the fusion mode of the subgraph structure in units of operators; and基于所述神经网络的网络结构生成有向无环图,其中,所述有向无环图包括算子以及算子之间的连线,所述算子之间的连线表征算子之间的数据依赖关系和数据流向。A directed acyclic graph is generated based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
- 根据权利要求9所述的算子融合方法,其中,在按照所述至少一个线性结构的模板融合模式对所述神经网络中的所述多个算子进行算子融合之前,所述方法还包括:The operator fusion method according to claim 9, wherein, before performing operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure, the method further comprises:按照所述子图结构的融合模式对所述神经网络的所述有向无环图中的算子进行算子融合,得到经子图结构算子融合后的图表达。 Operator fusion is performed on operators in the directed acyclic graph of the neural network according to the fusion mode of the subgraph structure to obtain a graph expression after the subgraph structure operator fusion.
- 根据权利要求10所述的算子融合方法,其中,所述按照所述至少一个线性结构的模板融合模式对所述神经网络中的所述多个算子进行算子融合包括:The operator fusion method according to claim 10, wherein the performing operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure comprises:根据所述神经网络中的所述多个算子各自的类别标签,按照所述至少一个线性结构的模板融合模式,对所述经子图结构算子融合后的图表达中的算子进行遍历,分别将一个或多个算子匹配为对应的所述线性结构的模板融合模式,对所述图表达中的算子进行算子融合。According to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of at least one linear structure, the operators in the graph expression after the fusion of the subgraph structure operators are traversed, and one or more operators are matched to the corresponding template fusion mode of the linear structure, and the operators in the graph expression are fused.
- 根据权利要求3-7任一项所述的算子融合方法,其中,The operator fusion method according to any one of claims 3 to 7, wherein:属于所述非模板类别的算子包括:激活函数、线性整流函数、绝对值函数、加法函数;Operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, addition function;属于所述矩阵类别的算子包括:前向卷积算子、反向数据卷积算子、反向滤波器卷积算子、矩阵乘法算子;Operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator;属于所述归一化类别的算子包括:批归一化算子、层归一化算子;Operators belonging to the normalization category include: batch normalization operator, layer normalization operator;属于所述池化类别的算子包括:最大池化层算子、平均池化层算子、全局平均池化层算子;Operators belonging to the pooling category include: a maximum pooling layer operator, an average pooling layer operator, and a global average pooling layer operator;属于所述数据重排类别的算子包括:拼接算子、变换算子;Operators belonging to the data rearrangement category include: concatenation operators, transformation operators;属于所述数据归约类别的算子包括:最大值函数、最小值函数、平均值函数;Operators belonging to the data reduction category include: maximum function, minimum function, average function;属于所述回归函数类别的算子包括:针对样点的回归函数、针对信道的回归函数;以及Operators belonging to the regression function category include: regression functions for sample points, regression functions for channels; and属于所述损失函数类别的算子包括:均方误差函数、交叉熵函数。Operators belonging to the loss function category include: mean square error function and cross entropy function.
- 一种计算装置,用于对神经网络进行算子融合,其中,所述神经网络包括多个算子,所述计算装置包括:A computing device for performing operator fusion on a neural network, wherein the neural network includes a plurality of operators, and the computing device includes:融合模式配置单元,配置成:基于算子的类别标签设置至少一个线性结构的模板融合模式;以及A fusion mode configuration unit, configured to: set a template fusion mode of at least one linear structure based on a category label of an operator; and融合单元,配置成根据所述神经网络中的所述多个算子各自的类别标签,按照所述至少一个线性结构的模板融合模式对所述神经网络中的所述多个算子进行算子融合,以融合所述神经网络中符合所述至少一个线性结构的模板融合模式的一个或多个算子。 The fusion unit is configured to perform operator fusion on the multiple operators in the neural network according to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the at least one linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the at least one linear structure.
- 根据权利要求13所述的计算装置,还包括分类单元,配置成:基于算子的功能和/或硬件平台的计算架构特点对算子进行分类,并针对不同类别的算子分配类别标签,其中,所述类别标签包括非模板类别以及模板类别,所述模板类别包括以下的一种或多种:矩阵类别、归一化类别、池化类别、数据重排类别、数据归约类别、回归函数类别以及损失函数类别。The computing device according to claim 13 further includes a classification unit, configured to: classify operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assign category labels to different categories of operators, wherein the category labels include non-template categories and template categories, and the template categories include one or more of the following: matrix categories, normalization categories, pooling categories, data rearrangement categories, data reduction categories, regression function categories, and loss function categories.
- 根据权利要求14所述的计算装置,其中,所述至少一个线性结构的模板融合模式由线性连接的一个或多个模板类别组成,所述基于算子的类别标签设置至少一个线性结构的模板融合模式包括:基于所述模板类别的类别标签且忽略所述非模板类别的类别标签,以设置所述至少一个线性结构的模板融合模式,The computing device according to claim 14, wherein the template fusion mode of the at least one linear structure is composed of one or more template categories connected linearly, and the setting of the template fusion mode of the at least one linear structure based on the category label of the operator comprises: setting the template fusion mode of the at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category,其中,所述计算装置还包括生成单元,配置成:基于所述神经网络的网络结构生成有向无环图,其中,所述有向无环图包括算子以及算子之间的连线,所述算子之间的连线表征算子之间的数据依赖关系和数据流向,The computing device further includes a generating unit configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators,其中,所述融合单元按照所述至少一个线性结构的模板融合模式对所述神经网络中的所述多个算子进行算子融合包括:The fusion unit performs operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure, including:根据所述神经网络中的所述多个算子各自的类别标签,按照所述至少一个线性结构的模板融合模式,对所述神经网络的所述有向无环图中的算子进行遍历,分别将一个或多个算子匹配为对应的所述线性结构的模板融合模式,对所述神经网络中的所述多个算子进行算子融合。According to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the at least one linear structure, the operators in the directed acyclic graph of the neural network are traversed, one or more operators are matched to the corresponding template fusion mode of the linear structure, and operator fusion is performed on the multiple operators in the neural network.
- 根据权利要求15所述的计算装置,其中,所述融合模式配置单元基于算子的类别标签设置至少一个线性结构的模板融合模式还包括:基于算子的类别标签设置线性结构的第一模板融合模式和第二模板融合模式,其中,在所述第一模板融合模式与所述第二模板融合模式存在相同部分的模板类别并且所述第一模板融合模式除所述相同部分的模板类别之外还包括其他模板类别的情况下,所述融合单元先按照所述第一模板融合模式对所述神经网络中的所述多个算子进行算子融合,再按照所述第二模板融合模式对所述神经网络中的所述多个算子进行算子融合。 According to the computing device according to claim 15, wherein the fusion mode configuration unit sets at least one template fusion mode of a linear structure based on the category label of the operator, and further includes: setting a first template fusion mode and a second template fusion mode of a linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode includes other template categories in addition to the same part of template categories, the fusion unit first performs operator fusion on the multiple operators in the neural network according to the first template fusion mode, and then performs operator fusion on the multiple operators in the neural network according to the second template fusion mode.
- 根据权利要求13所述的计算装置,其中,所述融合模式配置单元还配置成:以算子为单位设置子图结构的融合模式,其中,所述计算装置还包括生成单元,配置成:基于所述神经网络的网络结构生成有向无环图,其中,所述有向无环图包括算子以及算子之间的连线,所述算子之间的连线表征算子之间的数据依赖关系和数据流向。According to the computing device of claim 13, wherein the fusion mode configuration unit is further configured to: set the fusion mode of the subgraph structure in units of operators, wherein the computing device also includes a generation unit, configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
- 根据权利要求17所述的计算装置,其中,在按照所述至少一个线性结构的模板融合模式对所述神经网络中的所述多个算子进行算子融合之前,所述融合单元还配置成:The computing device according to claim 17, wherein, before performing operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure, the fusion unit is further configured to:按照所述子图结构的融合模式对所述神经网络的所述有向无环图中的算子进行算子融合,得到经子图结构算子融合后的图表达,Operators in the directed acyclic graph of the neural network are fused according to the fusion mode of the subgraph structure to obtain a graph expression after the subgraph structure operator fusion.其中,所述融合单元按照所述至少一个线性结构的模板融合模式对所述神经网络中的所述多个算子进行算子融合包括:The fusion unit performs operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure, including:根据所述神经网络中的所述多个算子各自的类别标签,按照所述至少一个线性结构的模板融合模式,对所述经子图结构算子融合后的图表达中的算子进行遍历,分别将一个或多个算子匹配为对应的所述线性结构的模板融合模式,对所述图表达中的算子进行算子融合。According to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of at least one linear structure, the operators in the graph expression after the fusion of the subgraph structure operators are traversed, and one or more operators are matched to the corresponding template fusion mode of the linear structure, and the operators in the graph expression are fused.
- 一种计算设备,包括:A computing device comprising:处理器;和Processor; and存储器,其中,所述存储器中存储有计算机可读代码,所述计算机可读代码在由所述处理器运行时,执行如权利要求1-12中任一项所述的算子融合方法。A memory, wherein the memory stores a computer-readable code, and when the computer-readable code is executed by the processor, the operator fusion method according to any one of claims 1 to 12 is executed.
- 一种非暂时性计算机可读存储介质,其上存储有指令,其中,所述指令在被处理器执行时实现如权利要求1-12中任一项所述的算子融合方法。 A non-transitory computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the operator fusion method according to any one of claims 1 to 12.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211268055.1A CN115563581A (en) | 2022-10-17 | 2022-10-17 | Operator fusion method, computing device, computing equipment and readable storage medium |
CN202211268055.1 | 2022-10-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024082551A1 true WO2024082551A1 (en) | 2024-04-25 |
Family
ID=84747342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/083784 WO2024082551A1 (en) | 2022-10-17 | 2023-03-24 | Operator fusion method, computing apparatus, computing device and readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115563581A (en) |
WO (1) | WO2024082551A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115563581A (en) * | 2022-10-17 | 2023-01-03 | 上海壁仞智能科技有限公司 | Operator fusion method, computing device, computing equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112579063A (en) * | 2021-03-01 | 2021-03-30 | 之江实验室 | Acceleration method for exploring optimization space in deep learning compiler |
CN113342345A (en) * | 2021-05-17 | 2021-09-03 | 北京百度网讯科技有限公司 | Operator fusion method and device of deep learning framework |
US20210350234A1 (en) * | 2019-01-28 | 2021-11-11 | Intel Corporation | Techniques to detect fusible operators with machine learning |
CN115563581A (en) * | 2022-10-17 | 2023-01-03 | 上海壁仞智能科技有限公司 | Operator fusion method, computing device, computing equipment and readable storage medium |
-
2022
- 2022-10-17 CN CN202211268055.1A patent/CN115563581A/en active Pending
-
2023
- 2023-03-24 WO PCT/CN2023/083784 patent/WO2024082551A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210350234A1 (en) * | 2019-01-28 | 2021-11-11 | Intel Corporation | Techniques to detect fusible operators with machine learning |
CN112579063A (en) * | 2021-03-01 | 2021-03-30 | 之江实验室 | Acceleration method for exploring optimization space in deep learning compiler |
CN113342345A (en) * | 2021-05-17 | 2021-09-03 | 北京百度网讯科技有限公司 | Operator fusion method and device of deep learning framework |
CN115563581A (en) * | 2022-10-17 | 2023-01-03 | 上海壁仞智能科技有限公司 | Operator fusion method, computing device, computing equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115563581A (en) | 2023-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10699186B2 (en) | Determining orders of execution of a neural network | |
JP6549332B2 (en) | Network model construction method and apparatus based on machine learning | |
CN110659728B (en) | Neural network optimization method, device, computer equipment and storage medium | |
WO2018058426A1 (en) | Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system | |
WO2021190597A1 (en) | Processing method for neural network model, and related device | |
CN111338695B (en) | Data processing method based on pipeline technology and related product | |
WO2023093623A1 (en) | Computation graph optimization method, data processing method and related product | |
US11818231B2 (en) | Logical node distributed signature decision system and a method thereof | |
US20220129408A1 (en) | Data actor and data processing method thereof | |
WO2022062529A1 (en) | Parallel decision-making system and method for distributed data processing | |
US20210318878A1 (en) | Method and system for accelerating ai training with advanced interconnect technologies | |
JP2017097863A (en) | Technologies for automatic reordering of sparse matrices | |
WO2021227418A1 (en) | Task deployment method and device based on multi-board fpga heterogeneous system | |
WO2024082551A1 (en) | Operator fusion method, computing apparatus, computing device and readable storage medium | |
Pujol-Perich et al. | Ignnition: Bridging the gap between graph neural networks and networking systems | |
US20170351633A1 (en) | Modifications to a stream processing topology during processing of a data stream | |
WO2022228224A1 (en) | Quantum computing task execution method and apparatus, and quantum computer operating system | |
US20240303470A1 (en) | Construction method and apparatus for bipartite graph, and display method and apparatus for bipartite graph | |
Barrachina et al. | Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs | |
WO2024193207A1 (en) | Data augmentation method and related apparatus | |
CN113807455A (en) | Method, apparatus, medium, and program product for constructing clustering model | |
US20230259486A1 (en) | Neural processing unit synchronization systems and methods | |
CN115729648A (en) | Operator scheduling method, device and system based on directed acyclic graph | |
Huang et al. | Adaptive partitioning and efficient scheduling for distributed DNN training in heterogeneous IoT environment | |
CN117391206B (en) | Quantum circuit processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23878553 Country of ref document: EP Kind code of ref document: A1 |