WO2024082551A1

WO2024082551A1 - Operator fusion method, computing apparatus, computing device and readable storage medium

Info

Publication number: WO2024082551A1
Application number: PCT/CN2023/083784
Authority: WO
Inventors: 杨文�; 段柳成; 李晓阳; 洪洲
Original assignee: 上海壁仞科技股份有限公司
Priority date: 2022-10-17
Filing date: 2023-03-24
Publication date: 2024-04-25
Also published as: CN115563581A

Abstract

Embodiments of the disclosure provide an operator fusion method, a computing apparatus, a computing device and a readable storage medium. The operator fusion method is applied to a neural network comprising a plurality of operators, and comprises: on the basis of a category label of an operator, setting a template fusion mode of at least one linear structure; according to respective category labels of a plurality of operators in a neural network, performing operator fusion on the plurality of operators in the neural network according to the template fusion mode of the at least one linear structure, so as to fuse one or more operators in the neural network conforming to the template fusion mode of the at least one linear structure.

Description

Operator fusion method, computing device, computing equipment and readable storage medium

This application claims priority to Chinese Patent Application No. 202211268055.1 filed on October 17, 2022, and the contents of the above-mentioned Chinese patent application disclosure are hereby cited in their entirety as a part of this application.

Technical Field

Embodiments of the present disclosure relate to an operator fusion method, a computing device, a computing apparatus, and a readable storage medium applied to a neural network including multiple operators.

Background technique

In order to improve the computational efficiency of neural networks, multiple operators that meet certain conditions or rules in the neural network are usually fused before the computation process to form a fused operator. The fused operator or a single operator that cannot be fused can be called a fused operator or a fusion layer. The computation process of the neural network is performed layer-by-layer in units of fusion layers, so operator fusion becomes an important process in neural network graph optimization.

Generally, the operator fusion process involves two steps: first, defining the fusion mode, that is, defining the target to be fused; second, determining the fusion algorithm, which is used to match the fusion mode that appears in the neural network and fuse without interfering with each other. The operator fusion scheme in the related art adopts a fixed fusion mode, and the fusion mode and the fusion algorithm are one-to-one corresponding, that is, one fusion mode corresponds to one fusion algorithm. This means that if the fusion mode changes, the fusion algorithm also needs to be changed to support it, so the expansion and customization of the fusion mode is limited.

Summary of the invention

Some embodiments of the present disclosure provide an operator fusion method, a computing device, a computing equipment, and a readable storage medium for providing an efficient and scalable operator fusion solution for various types of neural network structures.

According to one aspect of the present disclosure, an operator fusion method is provided, which is applied to a neural network including multiple operators. The operator fusion method comprises: setting at least one linear A template fusion mode of a structure; and according to the respective category labels of the multiple operators in the neural network, the multiple operators in the neural network are fused according to the template fusion mode of at least one linear structure to fuse one or more operators in the neural network that conform to the template fusion mode of at least one linear structure.

According to some embodiments of the present disclosure, the operator fusion method also includes: classifying operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assigning category labels to operators of different categories.

According to some embodiments of the present disclosure, category labels include non-template categories and template categories, wherein the template categories include one or more of the following: matrix categories, normalization categories, pooling categories, data rearrangement categories, data reduction categories, regression function categories, and loss function categories.

According to some embodiments of the present disclosure, operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.

According to some embodiments of the present disclosure, the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, wherein setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.

According to some embodiments of the present disclosure, the operator fusion method also includes: generating a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators, wherein, fusing multiple operators in the neural network according to at least one linear structure template fusion mode includes: according to the respective category labels of the multiple operators in the neural network, according to one or more template categories linearly connected in the template fusion mode of at least one linear structure, traversing the operators in the directed acyclic graph of the neural network, matching one or more operators to the corresponding linear structure template fusion mode, respectively, and fusing the multiple operators in the neural network.

According to some embodiments of the present disclosure, setting a template fusion mode of at least one linear structure based on the category label of an operator includes: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, firstly perform operator fusion on multiple operators in the neural network according to the first template fusion mode, and then perform operator fusion on multiple operators in the neural network according to the second template fusion mode.

According to some embodiments of the present disclosure, at least one linear structure is set based on the category label of the operator. The template fusion mode includes: using a configurable file to set at least one linear structure template fusion mode.

According to some embodiments of the present disclosure, the operator fusion method also includes: setting a fusion mode of a subgraph structure based on operators; and generating a directed acyclic graph based on the network structure of a neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.

According to some embodiments of the present disclosure, before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, the method also includes: performing operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after operator fusion of the subgraph structure.

According to some embodiments of the present disclosure, performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure includes: traversing the operators in a graph expression after fusion of sub-graph structure operators according to the respective category labels of the multiple operators in the neural network and according to the template fusion mode of at least one linear structure, respectively matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.

According to some embodiments of the present disclosure, operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, addition function; operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator; operators belonging to the normalization category include: batch normalization operator, layer normalization operator; operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator; operators belonging to the data rearrangement category include: splicing operator, transformation operator; operators belonging to the data reduction category include: maximum function, minimum function, average function; operators belonging to the regression function category include: regression function for sample points, regression function for channels; and operators belonging to the loss function category include: mean square error function, cross entropy function.

According to another aspect of the present disclosure, a computing device is provided for performing operator fusion on a neural network, wherein the neural network includes multiple operators. The computing device includes: a fusion mode configuration unit configured to: set at least one linear structure template fusion mode based on the category label of the operator; and a fusion unit configured to perform operator fusion on the multiple operators in the neural network according to the template fusion mode of at least one linear structure according to the category labels of the multiple operators in the neural network, so as to fuse one or more operators in the neural network that conform to the template fusion mode of at least one linear structure.

According to some embodiments of the present disclosure, the computing device further includes a classification unit configured to: The operators are classified according to the functions of the sub-operators and/or the computing architecture characteristics of the hardware platform, and category labels are assigned to different categories of operators, wherein the category labels include non-template categories and template categories, and the template categories include one or more of the following: matrix categories, normalization categories, pooling categories, data rearrangement categories, data reduction categories, regression function categories, and loss function categories.

According to some embodiments of the present disclosure, the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, and setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.

According to some embodiments of the present disclosure, the computing device also includes a generating unit configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between the operators.

According to some embodiments of the present disclosure, a fusion unit performs operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, including: traversing the operators in a directed acyclic graph of the neural network according to the template fusion mode of at least one linear structure according to the respective category labels of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the multiple operators in the neural network.

According to some embodiments of the present disclosure, a fusion mode configuration unit sets a template fusion mode of at least one linear structure based on the category label of the operator, including: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, the fusion unit first performs operator fusion on multiple operators in the neural network according to the first template fusion mode, and then performs operator fusion on multiple operators in the neural network according to the second template fusion mode.

According to some embodiments of the present disclosure, the fusion mode configuration unit sets the template fusion mode of at least one linear structure based on the category label of the operator, including: setting the template fusion mode of at least one linear structure using a configurable file.

According to some embodiments of the present disclosure, the fusion mode configuration unit is further configured to: set the fusion mode of the subgraph structure in units of operators. According to some embodiments of the present disclosure, the computing device further includes a generating unit configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph The graph includes operators and the connections between operators. The connections between operators represent the data dependencies and data flows between operators.

According to some embodiments of the present disclosure, before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, the fusion unit is further configured to: perform operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure, wherein the fusion unit performs operator fusion on multiple operators in the neural network according to a template fusion mode of at least one linear structure, including: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.

According to the computing device of some embodiments of the present disclosure, the operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, addition function; the operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator; the operators belonging to the normalization category include: batch normalization operator, layer normalization operator; the operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator; the operators belonging to the data rearrangement category include: splicing operator, transformation operator; the operators belonging to the data reduction category include: maximum function, minimum function, average function; the operators belonging to the regression function category include: regression function for sample points, regression function for channels; and the operators belonging to the loss function category include: mean square error function, cross entropy function.

According to another aspect of the present disclosure, a computing device is provided, including: a processor; and a memory, wherein the memory stores a computer-readable code, and when the computer-readable code is executed by the processor, the operator fusion method as described above is executed.

According to another aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, on which instructions are stored. When the instructions are executed by a processor, the operator fusion method described above is implemented.

By utilizing the operator fusion method, computing device, computing equipment and storage medium provided by some embodiments of the present disclosure, it is possible to set a template fusion mode of a linear structure based on the category label of the operator, and provide an operator fusion algorithm that is generally applicable to the template fusion mode of the linear structure, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process for the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present disclosure. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

FIG1A shows a schematic diagram of a neural network calculation process;

FIG1B shows a schematic diagram of neural network operator fusion in the related art;

FIG2 shows a schematic flow chart of an operator fusion method according to some embodiments of the present disclosure;

FIG3 shows a schematic diagram of operator classification according to some embodiments of the present disclosure;

FIG4 shows a schematic diagram of a template fusion mode according to some embodiments of the present disclosure;

FIG5A shows a schematic diagram of a directed acyclic graph of a neural network;

FIG5B is a schematic diagram showing an operation fusion of the directed acyclic graph shown in FIG5A ;

FIG5C shows the directed acyclic graph after fusion;

FIG6 shows a schematic diagram of fusion priority of a template fusion mode according to some embodiments of the present disclosure;

FIG7 shows a schematic block diagram of an operator fusion solution according to some embodiments of the present disclosure;

8A-8B show a schematic diagram of an operator fusion solution process according to some embodiments of the present disclosure;

FIG9 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure;

FIG10 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure;

FIG11 is a schematic diagram showing the architecture of an exemplary computing device according to some embodiments of the present disclosure;

FIG. 12 shows a schematic diagram of a computer-readable storage medium according to some embodiments of the present disclosure.

Detailed ways

The following will be combined with the drawings in the embodiments of the present disclosure to clearly and completely describe the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present disclosure.

In addition, as shown in the present disclosure and claims, unless the context clearly indicates an exception, the words "a", "an", "an" and/or "the" do not specifically refer to the singular, but also include the plural. The words "first", "second" and similar words used in the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Similarly, words such as "include" or "comprises" mean that the elements or objects appearing before the word include the elements or objects listed after the word. "Connect" or "connected" and similar terms are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Flowcharts are used in this disclosure to illustrate the steps of the method according to the embodiments of the present disclosure. It should be understood that the preceding or following steps are not necessarily performed precisely in order. On the contrary, various steps may be processed in reverse order or simultaneously. At the same time, other operations may also be added to these processes.

It is to be understood that the professional terms and nouns involved in the present disclosure have meanings well known to those skilled in the art.

Artificial Neural Networks (ANN), referred to as neural network, is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This network relies on the complexity of the system to adjust the interconnected relationships between a large number of internal nodes to achieve the purpose of processing information. Regardless of the type of neural network, their common characteristics are large-scale parallel processing, distributed storage, elastic topology, high redundancy and nonlinear operations, and have capabilities in terms of computing speed, associative ability, adaptability, fault tolerance and self-organization. These characteristics and capabilities constitute the technical basis for neural networks to simulate intelligent activities and have been used in various technical fields. For example, neural networks can be used in data compression, image processing, video coding, signal processing and other application fields.

In order to improve the computational efficiency of a neural network, adjacent multiple operators that meet certain conditions or rules in a neural network are usually fused to form a fused operator. A fused operator or a single operator that cannot be fused can be represented as a fusion layer, or simply a layer. The computational process of a neural network is usually performed layer-by-layer in units of fusion layers. In general, the output of the previous layer (or the previous layers) is used as the input of the next layer (or the next layers), thereby forming data dependencies between fusion layers. Schematically, FIG1A shows a schematic diagram of the neural network computational process. As shown in FIG1A, for a computational unit used for a neural network, it is generally refined into a matrix multiplication unit, a vector operation unit, and a scalar operation unit, which are used to perform different computational tasks, respectively. In the process of the computational unit performing the computation, it is necessary to use a data synchronization method through a shared memory to ensure the execution order between fusion layers with data dependencies. The shared memory can be based on an internal memory or an external memory, which is not limited here.

Therefore, the efficient and flexible implementation of the operator fusion process performed before the calculation process shown in FIG. 1A above has an important impact on the computational efficiency of the neural network. Through the internal structure of the network, it is necessary to quickly adjust and define the fusion mode and adapt the corresponding fusion algorithm to realize the operator fusion in the neural network.

In the related art, most operator fusion schemes adopt a fixed mode or a fixed rule fusion mode (i.e., define the target operator to be fused). FIG1B shows a schematic diagram of neural network operator fusion in the related art, in which four fixed types of fusion modes 1-fusion mode 4 are schematically shown. Based on the four fusion modes defined in FIG1B, it is necessary to adapt the corresponding four fusion algorithms 1-fusion algorithm 4 to match and find operators that meet the above fusion modes 1-fusion modes 4 in the network structure of the neural network and fuse them, so as to obtain a fused network structure diagram, so as to perform subsequent calculation processes based on the fused network structure diagram. As an example, in the fusion mode 1 shown in FIG1B, a linear fusion target (including operator 1, operator 2 and operator 3) is defined. In the process of performing operator fusion based on the fusion mode 1, it is necessary to design a fusion algorithm 1 to match the structure of operator 1-operator 2-operator 3 in the above fusion mode 1 among the many operators included in the neural network, and fuse them. In the related art, a subgraph matching method is usually used to implement the fusion algorithm, that is, a subgraph corresponding to the fusion mode is designed, and the operator structure diagram of the neural network is traversed to find the corresponding subgraph structure. It can be understood that the fusion modes shown in FIG1B are all performed in units of operators.

Under normal circumstances, for neural networks with more complex functional structures, it may be necessary to define more than a dozen or even dozens of fusion modes, which requires the operator fusion step to occupy more computing resources and debugging costs. As can be seen from Figure 1B, the defined fusion mode is static relative to the fusion algorithm, and the fusion algorithm and the fusion mode itself are directly related. For example, a fusion mode corresponds to a fusion algorithm. If the fusion mode changes, the fusion algorithm also needs to be changed to support it. Therefore, the expansion and customization of the fusion mode are limited, and it is not universally applicable, and the fusion algorithm also needs to be adjusted accordingly. In addition, an efficient fusion mode also needs to be determined by considering the architectural characteristics of the hardware platform running the neural network. Once the hardware platform is switched, it means that the fusion mode will also change accordingly, and the change in the fusion mode also requires the need to re-formulate and compile the corresponding fusion algorithm.

The present invention is applied to the field of efficient computing of neural network reasoning or training, solves the operator fusion problem in the graph optimization process of neural network, takes the universality and scalability of fusion mode as the goal, and provides an operator fusion strategy with variable fusion mode and fixed fusion algorithm.

Specifically, some embodiments of the present disclosure provide an operator fusion method for performing operator fusion on various types of neural network structures, and design a linear operator fusion method based on the category label setting of the operator. The template fusion mode of the structure and the corresponding operator fusion algorithm are designed so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform. The implementation process of the algorithm fusion method according to some embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

FIG2 shows a flow chart of an operator fusion method according to some embodiments of the present disclosure. As shown in FIG2, the operator fusion method includes steps S101 and S102. Specifically, in step S101, at least one linear structure template fusion mode is set based on the category label of the operator. In an embodiment according to the present disclosure, a template-type fusion mode is provided, and it is defined for the category label of the operator. Compared with the fusion mode in which the operator is the unit as shown in FIG1B, it has higher adaptability. As long as the operator conforms to the category, it can be fused, which can significantly reduce the number of fusion modes that need to be fused. Then, in step S102, according to the category labels of the multiple operators in the neural network, the multiple operators in the neural network are fused according to the template fusion mode of at least one linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the above-mentioned at least one linear structure. Next, how to generate the template fusion mode in step S101 will be described first, and then the corresponding fusion algorithm, that is, the implementation process of step S102 will be described.

In order to define the template fusion mode, it is necessary to formulate operator classification rules in advance to classify the various operators that may be included in the neural network and divide the numerous operators into several categories. Generally, the classification of operators needs to be combined with their functions and/or the architectural characteristics of the hardware platform. For example, different categories of computing functions are usually performed by different computing units, such as matrix operations, vector operations, scalar operations, special function units (SFUs), etc. Such multiple operators can be fused together to perform pipeline parallelism, and multiple operators of the same computing function can be fused together if they do not need to consume additional register resources or synchronization resources, and executed serially in the computing unit to avoid time consumption of data transfer and memory operations.

As an example, FIG3 shows a schematic diagram of operator classification according to some embodiments of the present disclosure. It can be understood that FIG3 only gives some examples of operator classification, and the operators of the neural network may also include operators not shown in FIG3 and other categories, which are not limited here. In other disclosed embodiments, other classification methods may be defined for operators in a neural network.

As shown in FIG3, according to some embodiments of the present disclosure, the above category labels may include non-template categories and template categories, wherein the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network. For operators belonging to the non-template category, during the calculation process, usually a sample point is still a sample point after calculation, and no additional computing resources are occupied. Therefore, in some embodiments of the present disclosure, the operators of this non-template category are set separately so as not to include operators of this category in template matching. As shown in FIG3, the operators belonging to the non-template category may be, for example, Element Wise operators, which may include: activation functions (Sigmoid, Swish), linear rectification functions (Relu), absolute value functions (Abs), addition functions (Add), etc., which are not listed one by one here. The above operators belonging to the non-template category are ubiquitous in the structure of the neural network. In addition, it can be understood that the operators belonging to the non-template category may also be other operators besides Element Wise operators, which are not given as examples here. According to some embodiments of the present disclosure, the template fusion mode of the above linear structure is composed of one or more template categories connected linearly, that is, the template fusion mode according to the embodiment of the present disclosure only includes operators of the template category and does not include the above operators belonging to the non-template category. Therefore, in the template fusion mode designed according to the embodiment of the present disclosure, the influence of such operators that do not occupy additional computing resources on the fusion mode is eliminated, which is conducive to further reducing the complexity of the template fusion mode and has more universal applicability.

Next, as shown in Figure 3, according to the embodiment of the present disclosure, the template category may include one or more of the following: matrix category (Matrix), normalization category (Normalize), pooling category (Pooling), data rearrangement category (Reorder), data reduction category (Reduce), regression function category (Softmax) and loss function category (Loss), etc., which are not listed one by one here.

As an example, operators belonging to the matrix category (Matrix) may include: forward convolution operator (Forward conv), backward data convolution operator (Backward data conv), backward filter convolution operator (Backward filter conv), matrix multiplication operator (Matrix Multiplication, MatMul), etc. Operators belonging to the normalization category (Normalize) may include: batch normalization operator (Batch norm), layer normalization operator (Layer norm), etc. Operators belonging to the pooling category (Pooling) may include: maximum pooling layer operator (Max pooling), average pooling layer operator (Average pooling), global average pooling layer operator (Global average pooling), etc. Operators belonging to the data rearrangement category (Reorder) may include: concatenation operator (Concate), transformation operator (Pemute), deformation operator (Reshape), cutting operator (Slice), etc. Operators belonging to the data reduction category (Reduce) may include: maximum function (Max), minimum function (Min), Average function (Average) and sum function (Sum), etc. Operators belonging to the regression function category (Softmax) may include: regression function for sample points (Softmax on sample), regression function for channel (Softmax on channel), etc. Operators belonging to the loss function category (Loss) may include: mean square error function (MSE), cross entropy function (Cross-entropy), etc.

It is understandable that in the embodiment according to the present disclosure, the operators of the neural network are first classified, and then the template fusion mode is defined in units of categories, rather than fusing the operators themselves, which is conducive to reducing the number of fusion modes required for the neural network (this will be reflected in the description below). In addition, in the template fusion mode defined above in units of categories, the influence of operators of non-template categories that do not occupy additional computing resources on the fusion mode is also eliminated, that is, such operators are not included in the defined template fusion mode, which is conducive to further reducing the complexity of the template fusion mode and is more generally applicable.

As an example, FIG4 shows a schematic diagram of a template fusion mode according to some embodiments of the present disclosure. In FIG4, four groups of example operator graphs are schematically shown on the left. In the operator fusion method provided according to the embodiment of the present disclosure, these four groups of example operator graphs can be characterized as the template fusion mode shown on the right side of FIG4, that is, a matrix category (Matrix) operator is connected to a normalization category (Normalize) operator. That is, when the template fusion mode shown on the right side is defined, the four example operator graphs shown on the left side appearing in the neural network all conform to the template fusion mode, and thus perform operator fusion. In addition, it can be understood that the template fusion mode can also summarize other forms of operator connection methods other than those shown on the left side of FIG4, that is, in the operator fusion process, as long as the operator that satisfies the template fusion mode will be fused to form a fused operator, that is, a fusion layer. This fully reflects the advantages of the template fusion mode provided by the embodiment of the present disclosure in terms of universal applicability, that is, it has the attributes of a template, thereby reducing the number of fusion modes required for the neural network. In comparison, in the related art, if the four operator connection forms shown on the left side of FIG. 4 are to be fused, four fusion modes and corresponding fusion algorithms need to be defined respectively.

According to some embodiments of the present disclosure, the operator fusion method may further include: generating a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.

A neural network can be viewed as a directed acyclic graph (DAG) consisting of many operators (also called computing nodes), and each node in the DAG corresponds to an operator in the neural network. It is understandable that a neural network can be constructed in any known manner. The network structure is abstracted as a directed acyclic graph DAG, which is not described in detail here. Specifically, FIG. 5A shows a schematic diagram of a directed acyclic graph of a neural network. As shown in FIG. 5A , a directed acyclic graph formed by a neural network including 13 operators is shown. In addition, the operators are connected by lines, which can be used to characterize the data dependencies and data flows between the operators. For example, the output data of operator 1 flows to operator 2, and the output data of operator 2 flows to operator 3 and operator 6, and so on. It can be determined that operator 1 has a data dependency relationship with operator 2, and operator 2 has a data dependency relationship with operator 6 and operator 3. It can be understood that the network structure shown in FIG. 5A is only schematic, and the operator fusion method according to the embodiment of the present disclosure can be applied to various types of neural network structures.

According to an embodiment of the present disclosure, after defining a template fusion mode of a linear structure according to the process described above (for example, the mode shown on the right side of FIG. 4 : Matrix-Normalize), an operator fusion process can be performed based on a directed acyclic graph of a neural network.

According to some embodiments of the present disclosure, the template fusion mode of the at least one linear structure is composed of one or more template categories that are linearly connected, wherein setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.

According to an embodiment of the present disclosure, performing operator fusion on multiple operators in a neural network according to a template fusion mode of a linear structure in step S102 includes: traversing the operators in a directed acyclic graph of the neural network according to at least one template fusion mode of a linear structure according to respective category labels of the multiple operators in the neural network, matching one or more operators to corresponding template fusion modes of a linear structure, and performing operator fusion on multiple operators in the neural network.

Specifically, FIG. 5B shows a schematic diagram of performing operator fusion on the directed acyclic graph shown in FIG. 5A according to the template fusion mode shown in FIG. 4 . In the embodiment according to the present disclosure, since the defined template fusion mode is performed in units of operator categories, in the process of operator fusion, it is also necessary to perform fusion according to the category labels of the respective operators in the neural network. As an example and for ease of description, it is assumed that the information of operators 1-13 shown in FIG. 5A is as shown in the following Table 1:

Table 1

Referring to the information shown in Table 1 above, it can be seen that in the directed acyclic graph shown in Figure 5A, three fusion operators can be matched, that is, according to the template fusion mode: Matrix-Normalize, three operator fusions can be performed. In other words, in Figure 5B, the fusion operators A, B and C all meet the template fusion mode Matrix-Normalize, and are thus fused. Figure 5C shows the fused graph representation. It can be understood that the operators listed in Table 1 are only schematic and are used to describe the process of performing operator fusion on a directed acyclic graph according to the defined template fusion mode.

In the operator fusion method provided according to the embodiment of the present disclosure, a template fusion mode of a linear structure and a corresponding operator fusion algorithm can be defined based on the category label of the operator, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of a neural network. This operator fusion process can be expressed as an operator fusion algorithm corresponding to the template fusion mode.

According to some embodiments of the present disclosure, the template fusion mode for setting a linear structure based on the category label of the operator includes: a first template fusion mode and a second template fusion mode for setting a linear structure based on the category label of the operator. In some implementations, not only one template fusion mode but also multiple template fusion modes need to be defined for the neural network, wherein, when the first template fusion mode and the second template fusion mode have the same part of the template category and the first template fusion mode includes other template categories in addition to the same part of the template category, the multiple operators in the neural network are first fused according to the first template fusion mode, and then the multiple operators in the neural network are fused according to the second template fusion mode.

FIG6 shows a schematic diagram of fusion priority of a template fusion mode according to some embodiments of the present disclosure, Among them, template fusion mode 1, template fusion mode 2 and template fusion mode 3 are schematically shown. For the convenience of description, the situation in which the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories can also be expressed as the first template fusion mode containing the second template fusion mode. Refer to Figure 6, which shows the inclusion relationship between template fusion modes, specifically, it can be expressed as template fusion mode 1 including template fusion mode 2 and template fusion mode 3. For example, template fusion mode 1 and template mode 2 have the same part of template categories (expressed as "Matrix-Normalize") and template fusion mode 1 also includes other template categories (expressed as "Softmax") in addition to the template category of the same part. Specifically, the template category of the same part corresponds to template fusion mode 2, and template fusion mode 2 does not include other template categories.

In addition, continuing to refer to Figure 6, template fusion mode 2 can also be represented as including template fusion mode 3. That is, template fusion mode 2 and template mode 3 have the same part of template categories (expressed as "Matrix") and template fusion mode 1 also includes other template categories (expressed as "Normalize"). Specifically, the same part of template categories corresponds to template fusion mode 3, and template fusion mode 3 does not include other template categories. For the situation shown in Figure 6, a fusion priority is set for the template fusion modes with an inclusion relationship. Specifically, template fusion mode 1 is matched and fused before template fusion mode 2 and template fusion mode 3, and template fusion mode 2 is fused before template fusion mode 3. In other words, the more template categories included, the higher the priority. Such a priority setting can make as many operators as possible merge into one fusion layer. Otherwise, without setting the above priority, when performing fusion matching based on a directed acyclic graph of a neural network, the operators that meet template fusion mode 3 may be directly fused, and the fusion matching that meets template fusion mode 1 cannot be guaranteed.

According to some embodiments of the present disclosure, setting a template fusion mode of a linear structure based on a category label of an operator may include: setting a template fusion mode of a linear structure using a configurable file. As an example, the format of the configurable file includes a json format or a yaml format. That is to say, in the operator fusion method according to an embodiment of the present disclosure, the fusion mode can be defined based on a configurable file, for example, a plurality of template fusion modes as shown in FIG6 are defined, and this process can be implemented dynamically and configurably to adapt to different neural network structures and adjustments to the hardware computing platform, thereby increasing the flexibility of the operator fusion process. It is understandable that the configurable file format is not limited to the two types listed above. In an embodiment of the present disclosure, the template fusion mode is configurable, for example, configured by means of a file, and a plurality of template fusion modes can form a set, which is input into a computing device by means of a file configuration, for use in the operator fusion process of the neural network.

In the template fusion mode of the linear structure described above, operators are connected in series, which means that operator fusion can be performed in one branch. In some application scenarios, the network structure of the neural network may be more complex. Therefore, in addition to the template fusion mode of the linear structure, other types of more complex fusion modes may also need to be defined, such as the fusion mode of the subgraph structure. This fusion mode of the seed graph structure requires operator fusion across multiple parallel branches to achieve higher computational efficiency.

The operator fusion method provided according to some embodiments of the present disclosure may also include: setting a fusion mode of a subgraph structure in units of operators. Compared with the above template fusion mode of a linear structure set in units of operator categories, the fusion mode of a subgraph structure is defined in units of operators, for example, the mode indicates that several operators with a specific connection relationship need to be fused.

Similarly, the operator fusion process based on subgraph matching is also based on the directed acyclic graph of the neural network. For example, each fusion mode of the subgraph structure can be regarded as a subgraph and described using the same graph intermediate representation (Graph IR) as the neural network. Then the operator fusion process based on the fusion mode of the subgraph structure is the subgraph matching process, that is, matching the subgraph that conforms to the Graph IR form from the graph representation of the neural network (usually a DAG graph) and fusing the operators that meet the matching conditions.

According to some embodiments of the present disclosure, before performing operator fusion on multiple operators in a neural network according to a template fusion mode of a linear structure, the operator fusion method may further include: performing operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure. In these embodiments, performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure includes: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on multiple operators in the neural network. In which, operators belonging to non-template categories in the neural network are ignored during the operator fusion process.

Operator fusion based on subgraph matching and operator fusion based on linear template fusion mode can be two independent fusion processes, and at the same time, they complement each other in function. The former is responsible for the calculation mode matching of complex structures, and the latter is responsible for the calculation mode matching of linear structures. In the entire operator fusion process of the neural network, the operator fusion based on subgraph matching can be executed first, and then the operator fusion process based on the linear template fusion mode can be executed. The execution of the latter can be based on the fusion result of the former, and further fusion can be performed on this basis.

FIG. 7 shows a schematic block diagram of an operator fusion solution according to some embodiments of the present disclosure, and FIG. 8A- 8B shows a schematic flow chart of an operator fusion scheme according to some embodiments of the present disclosure. The operator fusion scheme according to some embodiments of the present disclosure will be described as a whole in conjunction with FIG. 7 , FIG. 8A and FIG. 8B .

As shown in Figure 7, the operator fusion scheme can be divided into two parts: defining the fusion mode and executing the fusion algorithm based on the defined fusion mode, so as to obtain the neural network graph expression after operator fusion. Specifically, in the stage of defining the fusion mode, a configurable file (such as json format or yaml format) can be used to set the template fusion mode of the linear structure and the fusion mode of the subgraph structure. The present disclosure does not limit the number of fusion modes and the specific mode form.

In the stage of executing the fusion algorithm, for the directed acyclic graph DAG of the neural network, the subgraph-based operator fusion process is first performed according to the fusion mode of the defined subgraph structure, for example, through strict subgraph matching. After this step, the graph expression after the subgraph structure operator fusion is obtained. Then, according to the category labels of the respective operators, according to one or more template categories linearly connected in the template fusion mode of the linear structure, the operators in the graph expression after the subgraph structure operator fusion are fused, and in the operator fusion process, the operators belonging to the non-template category in the neural network are ignored to obtain the graph expression after the linear operator fusion.

As an example, as shown in FIG8A , for the fusion mode of the subgraph structure shown (including operator 6, operator 7, operator 8, and operator 9), firstly, strict subgraph matching is performed in the DAG to find operators that meet the mode, and the operators are fused to obtain a graph expression after the subgraph structure operator fusion, wherein the fused layer after fusion is represented by an ellipse P, and is shown as fusion layer P in FIG8B . As for the template category corresponding to this fusion layer P, a direct definition method can be adopted. For example, for operator fusion based on subgraph matching, a unique category label, such as category X, can be defined to distinguish it from operators of categories such as Matrix, Normalize, etc. shown in FIG3 , so as to be used for operator fusion in the subsequent linear template matching process. Next, as shown in Figure 8B, for the graph expression after the sub-graph structure operator fusion, the operator fusion process based on the linear structure template fusion mode will continue. For example, the operators that meet the template are fused and represented as fusion layer A, fusion layer B, fusion layer C, fusion layer D and fusion layer E respectively, and finally the graph expression after linear operator fusion is obtained.

By using the operator fusion method of some embodiments of the present disclosure, it is possible to perform operator fusion for various types of neural network structures, and provide a template fusion mode of a linear structure set based on the category label of the operator, and a corresponding operator fusion algorithm, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the performance of the neural network on the hardware platform. Computational efficiency.

According to another aspect of the present disclosure, a computing device is also provided for performing operator fusion on a neural network, wherein the neural network includes a plurality of operators. The computing device according to the embodiment of the present disclosure can be applied to the field of efficient computing for neural network reasoning or training, solve the operator fusion problem in the graph optimization process of the neural network, and provide an operator fusion strategy with a variable fusion mode and a fixed fusion algorithm with the universality and scalability of the fusion mode as the goal. Specifically, the computing device according to some embodiments of the present disclosure can perform operator fusion on various types of neural network structures, and designs a template fusion mode of a linear structure set based on the category label of the operator, as well as a corresponding operator fusion algorithm, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of the neural network, which is beneficial to the graph optimization of the neural network and improves the computing efficiency of the neural network on the hardware platform.

Fig. 9 shows a schematic block diagram of a computing device according to some embodiments of the present disclosure. As shown in Fig. 9 , a computing device 1000 according to an embodiment of the present disclosure includes: a fusion mode configuration unit 1010 and a fusion unit 1020 .

Specifically, the fusion mode configuration unit 1010 can be configured to set at least one linear structure template fusion mode based on the category label of the operator. It can be understood that the term "at least one" in the present disclosure can be understood as one or more than one, that is, one, two or more, and is not limited here. The number of template fusion modes is no longer limited below, and can be similarly interpreted as one or more than one template fusion modes.

In an embodiment according to the present disclosure, a template-type fusion mode is defined, and is defined for the category label of the operator. Compared with the fusion mode based on operators as shown in FIG1B , it has higher adaptability. As long as the operators conform to the category, they can be fused, which can significantly reduce the number of fusion modes that need to be designed.

The fusion unit 1020 can be configured to perform operator fusion on multiple operators in the neural network according to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the linear structure.

According to some embodiments of the present disclosure, the computing device further includes a classification unit 1030, which is configured to classify operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assign category labels to operators of different categories. It is understood that the term "and/or" in the present disclosure represents three situations, namely, based on the functions of the operators, based on the computing architecture characteristics of the hardware platform, and based on the functions of the operators and the computing architecture characteristics of the hardware platform. According to some embodiments of the present disclosure, the category labels include non-modal Plate categories and template categories, the template category includes one or more of the following: matrix category, normalization category, pooling category, data rearrangement category, data reduction category, regression function category and loss function category.

In order to define the template fusion mode, we must first classify the various operators required by the neural network and divide the numerous operators into several categories. Generally, the classification of operators needs to be combined with their functions and the architectural characteristics of the hardware platform. For example, different categories of computing functions are usually performed by different computing units, such as matrix operations, vector operations, scalar operations, special function units, etc. Such multiple operators can be fused together for pipeline parallelism, and multiple operators of the same computing function can be fused together if they do not need to consume additional register resources or synchronization resources. They can be executed serially in the computing unit to avoid time consumption of data transfer and memory operations.

According to the computing device of some embodiments of the present disclosure, the operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, and addition function. Specifically, the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network. For the operators belonging to the non-template category, during the calculation process, a sample point is still a sample point after calculation, and no additional resources are occupied. Therefore, in some embodiments of the present disclosure, the operators of this category are set separately so as not to include the operators of this category in the template matching. As shown in FIG3, the operators belonging to the non-template category may be, for example, Element Wise operators. As an example, the Element Wise operator may include activation functions (Sigmoid, Swish), linear rectification functions (Relu), absolute value functions (Abs), addition functions (Add), etc., which are not listed one by one here. The above operators belonging to the non-template category are ubiquitous in the structure of the neural network. For example, the template fusion mode of the above linear structure is composed of one or more template categories connected in a straight line, that is, the template fusion mode according to the embodiment of the present disclosure only includes operators of the template category but does not include the above operators belonging to the non-template category. Therefore, in the template fusion mode designed according to the embodiment of the present disclosure, the influence of such operators that do not occupy additional computing resources on the fusion mode is eliminated, which is conducive to further reducing the complexity of the template fusion mode and making it more universally applicable.

According to some embodiments of the present disclosure, operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator; operators belonging to the normalization category include: batch normalization operator, layer normalization operator; operators belonging to the pooling category include: maximum pooling layer operator, average pooling layer operator, global average pooling layer operator; operators belonging to the data rearrangement category include: splicing operator, transformation operator; operators belonging to the data reduction category include: maximum value function, minimum value function, average value function; operators belonging to the regression function category include: for sample points The regression function for the channel; and the operators belonging to the loss function category include: mean square error function, cross entropy function.

It is understandable that in the embodiment according to the present disclosure, the operators of the neural network are first classified, and then the template fusion mode is defined in units of categories, rather than fusion by the operators themselves, which is conducive to reducing the number of fusion modes required for the neural network (this will be reflected below). In addition, in the template fusion mode defined in units of categories above, the influence of operators of non-template categories that do not occupy additional computing resources on the fusion mode is also eliminated, that is, such operators are not included in the defined template fusion mode, which is conducive to further reducing the complexity of the template fusion mode and is more generally applicable.

According to some embodiments of the present disclosure, the template fusion mode of at least one linear structure is composed of one or more template categories that are linearly connected, and setting the template fusion mode of at least one linear structure based on the category label of the operator includes: setting the template fusion mode of at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category. As an example, FIG4 shows a schematic diagram of the template fusion mode according to some embodiments of the present disclosure, and the specific structure can refer to the description made in conjunction with FIG4 above.

According to some embodiments of the present disclosure, as shown in FIG9 , the computing device further includes a generating unit 1040, which is configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and lines between operators, and the lines between operators represent data dependencies and data flows between operators. An example of a directed acyclic graph can be referred to FIG5A , which will not be repeated here.

According to some embodiments of the present disclosure, the fusion unit 1020 performs operator fusion on multiple operators in the neural network according to the template fusion mode of at least one linear structure, including: according to the respective category labels of the multiple operators in the neural network, according to the template fusion mode of at least one linear structure, traversing the operators in the directed acyclic graph of the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, respectively, and performing operator fusion on the multiple operators in the neural network.

According to some embodiments of the present disclosure, the fusion mode configuration unit 1010 sets at least one template fusion mode of a linear structure based on the category label of the operator, including: setting a first template fusion mode and a second template fusion mode of the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, the fusion unit 1020 first performs operator fusion on multiple operators in the neural network according to the first template fusion mode, and then performs operator fusion on multiple operators in the neural network according to the second template fusion mode.

In the operator fusion method provided according to the embodiment of the present disclosure, a template fusion mode of a linear structure and a corresponding operator fusion algorithm are defined based on the category label of the operator, so that the designed template fusion mode has universality and scalability and can be fused in units of operator categories, thereby providing an efficient operator fusion process of a neural network. This operator fusion process can be expressed as an operator fusion algorithm corresponding to the template fusion mode.

According to some embodiments of the present disclosure, the fusion mode configuration unit 1010 sets the template fusion mode of the linear structure based on the category label of the operator, including: setting the template fusion mode of the linear structure by using a configurable file.

According to some embodiments of the present disclosure, the fusion mode configuration unit 1010 is further configured to: set the fusion mode of the subgraph structure in units of operators. According to some embodiments of the present disclosure, the computing device further includes a generation unit 1040, configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.

According to some embodiments of the present disclosure, before performing operator fusion on multiple operators in a neural network according to a template fusion mode of at least one linear structure, the fusion unit 1020 is further configured to: perform operator fusion on operators in a directed acyclic graph of the neural network according to a fusion mode of a subgraph structure to obtain a graph expression after fusion of operators of the subgraph structure, wherein the fusion unit 1020 performs operator fusion on multiple operators in the neural network according to the template fusion mode of the linear structure, including: traversing the operators in the graph expression after fusion of operators of the subgraph structure according to the template fusion mode of at least one linear structure according to the category labels of each of the multiple operators in the neural network, matching one or more operators to the corresponding template fusion mode of the linear structure, and performing operator fusion on the operators in the graph expression.

Regarding the specific implementation process involved in the operator fusion performed by the computing device according to the embodiment of the present disclosure, reference may be made to the operator fusion method according to some embodiments of the present disclosure described above in conjunction with FIG. 2 to FIG. 8B . The description is not repeated here. The computing device using the embodiment of the present disclosure can perform a similar operator fusion process and achieve similar technical effects.

According to another aspect of the present disclosure, a computing device is also provided. Fig. 10 shows a schematic block diagram of a computing device according to an embodiment of the present disclosure.

As shown in Fig. 10, the computing device 2000 may include a processor 2010 and a memory 2020. According to an embodiment of the present disclosure, the memory 2020 stores a computer-readable code, and when the computer-readable code is executed by the processor 2010, the operator fusion method described above may be executed.

The processor 2010 can perform various actions and processes according to the program stored in the memory 2020. Specifically, the processor 2010 can be an integrated circuit with signal processing capabilities. A general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc. For example, the processor here can refer to a computing device capable of performing neural network calculations.

The memory 2020 stores computer executable instruction codes, which are used to implement the operator fusion method according to the embodiment of the present disclosure when executed by the processor 2010. The memory 2020 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. It should be noted that the memory described in the present disclosure may be any suitable type of memory. As an example, by executing the computer executable instruction codes in the memory 2020, a processor such as a CPU can implement an operator fusion method for synchronization between neural network layers.

The operator fusion method or computing device according to the embodiment of the present disclosure can also be implemented with the help of the architecture of the computing device 3000 shown in FIG11. As shown in FIG11, the computing device 3000 may include a bus 3010, one or more CPUs 3020, a read-only memory (ROM) 3030, a random access memory (RAM) 3040, a communication port 3050 connected to a network, an input/output component 3060, a hard disk 3070, etc. The storage device in the computing device 3000, such as the ROM 3030 or the hard disk 3070, can store various data or files used for processing and/or communication of the operator fusion method provided by the present disclosure and program instructions executed by the CPU. The computing device 3000 may also include a user interface 3080. Of course, the architecture shown in FIG11 is only exemplary. When implementing different devices, one or more components in the computing device shown in FIG11 may be omitted according to actual needs.

According to yet another aspect of the present disclosure, a non-transitory computer-readable storage medium is also provided. Fig. 12 shows a schematic diagram 4000 of a storage medium according to the present disclosure.

As shown in FIG. 12 , a computer storage medium 4020 stores computer readable instructions 4010. When the computer readable instructions 4010 are executed by a processor, the operator fusion method described with reference to the above figures may be executed. The computer readable storage medium includes, but is not limited to, volatile memory and/or non-volatile memory. Volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. For example, the computer storage medium 4020 may be connected to a computing device such as a computer, and then, when the computing device runs the computer-readable instructions 4010 stored on the computer storage medium 4020, the operator fusion method provided according to the embodiment of the present disclosure as described above may be performed.

In summary, some embodiments of the present disclosure provide an operator fusion method, a computing device, a computing equipment and a storage medium, which are used to provide operator fusion solutions for various types of neural network structures, especially neural networks with more complex network structures. More specifically, for performing operator fusion on various types of neural network structures, a template fusion mode of a linear structure set based on operator category labels and a corresponding operator fusion algorithm are designed to make the designed template fusion mode universal and scalable and able to be fused in units of operator categories, thereby providing an efficient operator fusion process for the neural network, which is beneficial to the graph optimization of the neural network and improving the computing efficiency of the neural network on the hardware platform.

Those skilled in the art will appreciate that the contents disclosed in this disclosure may be subject to various variations and improvements. For example, the various devices or components described above may be implemented by hardware, or by software, firmware, or a combination of some or all of the three.

In addition, although the present disclosure makes various references to certain units in the system according to embodiments of the present disclosure, any number of different units may be used and run on the client and/or server. The units are illustrative only, and different aspects of the systems and methods may use different units.

Those skilled in the art will appreciate that all or part of the steps in the above method can be completed by instructing related hardware through a computer program, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk. Optionally, all or part of the steps in the above embodiment can also be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment can be implemented in the form of hardware or in the form of a software functional module. The present disclosure is not limited to any particular form of combination of hardware and software.

Unless otherwise defined, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure belongs. It should also be understood that terms such as those defined in common dictionaries They should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and should not be interpreted in an idealized or extremely formal sense, unless expressly defined as such herein.

The above is an explanation of the present disclosure and should not be considered as a limitation thereof. Although several exemplary embodiments of the present disclosure are described, it will be readily understood by those skilled in the art that many modifications may be made to the exemplary embodiments without departing from the novel teachings and advantages of the present disclosure. Therefore, all such modifications are intended to be included within the scope of the present disclosure as defined in the claims. It should be understood that the above is an explanation of the present disclosure and should not be considered to be limited to the specific embodiments disclosed, and modifications to the disclosed embodiments and other embodiments are intended to be included within the scope of the appended claims. The present disclosure is defined by the claims and their equivalents.

Claims

An operator fusion method is applied to a neural network including multiple operators, the method comprising:

Setting a template fusion mode of at least one linear structure based on the category label of the operator; and

According to the respective category labels of the multiple operators in the neural network, the multiple operators in the neural network are subjected to operator fusion according to the template fusion mode of the at least one linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the at least one linear structure.
The operator fusion method according to claim 1 further includes: classifying operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assigning category labels to operators of different categories.
The operator fusion method according to claim 1 or 2, wherein the category label includes a non-template category and a template category, and the template category includes one or more of the following:

Matrix category, normalization category, pooling category, data shuffling category, data reduction category, regression function category, and loss function category.
The operator fusion method according to claim 3, wherein the operators belonging to the non-template category do not consume register resources or synchronous memory resources during the calculation process of the neural network.
According to the operator fusion method according to claim 3 or 4, the template fusion mode of the at least one linear structure is composed of one or more template categories that are linearly connected, and the template fusion mode of the at least one linear structure is set based on the category label of the operator, including: setting the template fusion mode of the at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category.
The operator fusion method according to claim 5, further comprising: generating a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators,

Wherein, the template fusion mode according to the at least one linear structure is used to fusion the neural network The multiple operators in the above method are subjected to operator fusion, comprising:

According to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the at least one linear structure, the operators in the directed acyclic graph of the neural network are traversed, one or more operators are matched to the corresponding template fusion mode of the linear structure, and operator fusion is performed on the multiple operators in the neural network.
According to the operator fusion method according to claim 5 or 6, wherein the template fusion mode for setting at least one linear structure based on the category label of the operator also includes: a first template fusion mode and a second template fusion mode for setting the linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode also includes other template categories in addition to the same part of template categories, the multiple operators in the neural network are first fused according to the first template fusion mode, and then the multiple operators in the neural network are fused according to the second template fusion mode.
The operator fusion method according to claim 1, wherein the template fusion mode of setting at least one linear structure based on the category label of the operator comprises:

The template fusion mode of the at least one linear structure is set using a configurable file.
The operator fusion method according to claim 1, further comprising:

Setting the fusion mode of the subgraph structure in units of operators; and

A directed acyclic graph is generated based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
The operator fusion method according to claim 9, wherein, before performing operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure, the method further comprises:

Operator fusion is performed on operators in the directed acyclic graph of the neural network according to the fusion mode of the subgraph structure to obtain a graph expression after the subgraph structure operator fusion.
The operator fusion method according to claim 10, wherein the performing operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure comprises:

According to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of at least one linear structure, the operators in the graph expression after the fusion of the subgraph structure operators are traversed, and one or more operators are matched to the corresponding template fusion mode of the linear structure, and the operators in the graph expression are fused.
The operator fusion method according to any one of claims 3 to 7, wherein:

Operators belonging to the non-template category include: activation function, linear rectification function, absolute value function, addition function;

Operators belonging to the matrix category include: forward convolution operator, reverse data convolution operator, reverse filter convolution operator, matrix multiplication operator;

Operators belonging to the normalization category include: batch normalization operator, layer normalization operator;

Operators belonging to the pooling category include: a maximum pooling layer operator, an average pooling layer operator, and a global average pooling layer operator;

Operators belonging to the data rearrangement category include: concatenation operators, transformation operators;

Operators belonging to the data reduction category include: maximum function, minimum function, average function;

Operators belonging to the regression function category include: regression functions for sample points, regression functions for channels; and

Operators belonging to the loss function category include: mean square error function and cross entropy function.
A computing device for performing operator fusion on a neural network, wherein the neural network includes a plurality of operators, and the computing device includes:

A fusion mode configuration unit, configured to: set a template fusion mode of at least one linear structure based on a category label of an operator; and

The fusion unit is configured to perform operator fusion on the multiple operators in the neural network according to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the at least one linear structure, so as to fuse one or more operators in the neural network that conform to the template fusion mode of the at least one linear structure.
The computing device according to claim 13 further includes a classification unit, configured to: classify operators based on the functions of the operators and/or the computing architecture characteristics of the hardware platform, and assign category labels to different categories of operators, wherein the category labels include non-template categories and template categories, and the template categories include one or more of the following: matrix categories, normalization categories, pooling categories, data rearrangement categories, data reduction categories, regression function categories, and loss function categories.
The computing device according to claim 14, wherein the template fusion mode of the at least one linear structure is composed of one or more template categories connected linearly, and the setting of the template fusion mode of the at least one linear structure based on the category label of the operator comprises: setting the template fusion mode of the at least one linear structure based on the category label of the template category and ignoring the category label of the non-template category,

The computing device further includes a generating unit configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators,

The fusion unit performs operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure, including:

According to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of the at least one linear structure, the operators in the directed acyclic graph of the neural network are traversed, one or more operators are matched to the corresponding template fusion mode of the linear structure, and operator fusion is performed on the multiple operators in the neural network.
According to the computing device according to claim 15, wherein the fusion mode configuration unit sets at least one template fusion mode of a linear structure based on the category label of the operator, and further includes: setting a first template fusion mode and a second template fusion mode of a linear structure based on the category label of the operator, wherein, when the first template fusion mode and the second template fusion mode have the same part of template categories and the first template fusion mode includes other template categories in addition to the same part of template categories, the fusion unit first performs operator fusion on the multiple operators in the neural network according to the first template fusion mode, and then performs operator fusion on the multiple operators in the neural network according to the second template fusion mode.
According to the computing device of claim 13, wherein the fusion mode configuration unit is further configured to: set the fusion mode of the subgraph structure in units of operators, wherein the computing device also includes a generation unit, configured to: generate a directed acyclic graph based on the network structure of the neural network, wherein the directed acyclic graph includes operators and connections between operators, and the connections between operators represent data dependencies and data flows between operators.
The computing device according to claim 17, wherein, before performing operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure, the fusion unit is further configured to:

Operators in the directed acyclic graph of the neural network are fused according to the fusion mode of the subgraph structure to obtain a graph expression after the subgraph structure operator fusion.

The fusion unit performs operator fusion on the multiple operators in the neural network according to the template fusion mode of the at least one linear structure, including:

According to the respective category labels of the multiple operators in the neural network and in accordance with the template fusion mode of at least one linear structure, the operators in the graph expression after the fusion of the subgraph structure operators are traversed, and one or more operators are matched to the corresponding template fusion mode of the linear structure, and the operators in the graph expression are fused.
A computing device comprising:

Processor; and

A memory, wherein the memory stores a computer-readable code, and when the computer-readable code is executed by the processor, the operator fusion method according to any one of claims 1 to 12 is executed.
A non-transitory computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the operator fusion method according to any one of claims 1 to 12.