CN117492722A

CN117492722A - Code generation method, device, computer equipment and storage medium

Info

Publication number: CN117492722A
Application number: CN202210853361.5A
Authority: CN
Inventors: 杨晶; 武小龙
Original assignee: Glenfly Tech Co Ltd
Current assignee: Glenfly Tech Co Ltd
Priority date: 2022-07-20
Filing date: 2022-07-20
Publication date: 2024-02-02

Abstract

The application relates to a code generation method, a code generation device, a computer device and a storage medium. The method comprises the following steps: extracting at least two fusible network layers from the neural network; acquiring a plurality of operations contained in each of the at least two fusible network layers; combining a plurality of operations contained in each of the at least two fusible network layers to obtain a fusion node; and acquiring the quantization realization codes of the fusion nodes according to the operation combination corresponding to the fusion nodes, the abstract syntax tree of each fusion network layer and the type of the back-end hardware. The method replaces manual addition by an automatic generation mode, can reduce the labor cost and has lower error rate.

Description

Code generation method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a code generating method, apparatus, computer device, and storage medium.

Background

At present, many new nodes are generated after the neural network is subjected to a fusion process, and common fusible network layers are as follows: eltwise layer and active layer. When the operations contained in the Eltwise layer and the operations contained in the activation layer are combined, a plurality of fusion nodes can be obtained, and the nodes can be deployed to the back-end hardware by adding quantization realization codes.

In the prior art, a quantization realization code can be added for each fusion node in a handwriting kernel mode, however, as new neural network layers are increased, the number of the fusion nodes is exponentially increased, the manual code adding method is definitely required to increase the labor cost, and the error rate is higher.

Disclosure of Invention

In view of the above, it is desirable to provide a code generation method, apparatus, computer device, and storage medium that can reduce labor costs.

In a first aspect, the present application provides a code generation method, the method including:

extracting at least two fusible network layers from the neural network;

acquiring a plurality of operations contained in each of the at least two fusible network layers;

combining a plurality of operations contained in each of the at least two fusible network layers to obtain a fusion node;

and acquiring the quantization realization codes of the fusion nodes according to the operation combination corresponding to the fusion nodes, the abstract syntax tree of each fusion network layer and the type of the back-end hardware.

In one embodiment, before the obtaining the quantized implementation code of the fusion node according to the operation combination corresponding to the fusion node, the abstract syntax tree of each fusion network layer, and the type of the backend hardware, the method further includes:

and aiming at each of the at least two fusible network layers, acquiring codes of each of a plurality of operations contained in the fusible network layers, and disassembling the codes to obtain abstract syntax trees corresponding to the operations, wherein the abstract syntax trees corresponding to the plurality of operations contained in the fusible network layers form the abstract syntax tree of the fusible network layer.

In one embodiment, the obtaining the quantization implementation code of the fusion node according to the operation combination corresponding to the fusion node, the abstract syntax tree of each fusion network layer, and the type of the back-end hardware includes:

according to the operation combination corresponding to the fusion node, an abstract syntax tree of the operation contained in the operation combination is obtained from the abstract syntax tree of each fusion network layer;

analyzing an abstract syntax tree of the operation contained in the operation combination to obtain a code corresponding to the operation combination;

and acquiring the quantization realization code of the fusion node according to the code corresponding to the operation combination and the type of the back-end hardware.

In one embodiment, the obtaining the quantization implementation code of the fusion node according to the code corresponding to the operation combination and the type of the backend hardware includes:

and converting codes corresponding to the operation combination into a form which is adapted to the type of the back-end hardware, and taking the conversion result as the quantization realization codes.

In one embodiment, the at least two fusible network layers include: an Eltwise layer and an activation layer; the Eltwise layer comprises a plurality of operations of: dot product, add subtraction sum and take maximum max; the activation layer includes a plurality of operations that are: tanh, sigmoid, relu and pre; the operation combination corresponding to the fusion node is one of the following combinations: a combination of point product and tanh, a combination of point product and sigmoid, a combination of point product and inlu, a combination of addition reduction sum and tanh, a combination of addition reduction sum and sigmoid, a combination of addition reduction sum and inlu, a combination of maximum max and tanh, a combination of maximum max and sigmoid, a combination of maximum max and inlu, or a combination of maximum max and inlu.

In one embodiment, the type of backend hardware is one of the following types: a central processing unit CPU, a graphics processing unit GPU or a data processing unit DPU.

In a second aspect, the present application further provides a code generating apparatus. The device comprises:

the extraction module is used for extracting at least two fusible network layers from the neural network;

the acquisition module is used for acquiring a plurality of operations contained in each of the at least two fusible network layers;

the fusion module is used for combining a plurality of operations contained in each of the at least two fusible network layers to obtain a fusion node;

and the generation module is used for acquiring the quantization realization codes of the fusion nodes according to the operation combination corresponding to the fusion nodes, the abstract syntax tree of each fusion network layer and the type of the rear-end hardware.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

extracting at least two fusible network layers from the neural network;

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

extracting at least two fusible network layers from the neural network;

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

extracting at least two fusible network layers from the neural network;

The code generation method, the code generation device, the computer equipment and the storage medium extract at least two fusible network layers from the neural network; acquiring a plurality of operations contained in each of the at least two fusible network layers; combining a plurality of operations contained in each of the at least two fusible network layers to obtain a fusion node; and acquiring the quantization realization codes of the fusion nodes according to the operation combination corresponding to the fusion nodes, the abstract syntax tree of each fusion network layer and the type of the back-end hardware. The method replaces manual addition by an automatic generation mode, can reduce the labor cost and has lower error rate.

Drawings

FIG. 1 is a schematic diagram of a fusion node in one embodiment;

FIG. 2 is a flow diagram of a code generation method in one embodiment;

FIG. 3 is a schematic diagram of a neural network in one embodiment;

FIG. 4 is a flow chart of a code generation method according to another embodiment;

FIG. 5 is a flow chart of a code generation method according to another embodiment;

FIG. 6 is a block diagram of a code generation apparatus in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In the process of deploying the neural network in the back-end hardware, two operations are combined, and one calculation unit is used for processing the combination, and compared with the process of respectively processing the two operations by using two calculation units, the calculation resource can be saved. The fusible network layer in the neural network refers to: if multiple operations contained in one network layer can be combined with multiple operations contained in another network layer, both network layers are fusible network layers.

Common fusible network layers are: eltwise layer and active layer. Referring to fig. 1, the Eltwise layer includes several operations: dot product, add subtraction sum and take maximum max; the active layer includes a plurality of operations of: tanh, sigmoid, relu and pre. When a plurality of operations included in the Eltwise layer and a plurality of operations included in the activation layer are combined, 12 fusion nodes can be obtained, and the operation combinations corresponding to the 12 fusion nodes are respectively as follows: the nodes need to add quantization implementation code to deploy the neural network to backend hardware.

It should be noted that: the converged network layer may also be other network layers, and the above-mentioned Eltwise layer and activation layer are only examples, and do not limit the present application. The Eltwise layer contains several operations, besides tanh, sigmoid, relu and prelu, but also leak reuu and RRelu, and the embodiments of the present application are described by way of example only with tanh, sigmoid, relu and prelu, which do not limit the present application.

In some embodiments, the quantization implementation code can be added to each fusion node by handwriting kernel, however, as new neural network layers increase, the number of fusion nodes increases exponentially, the method of manually adding codes certainly requires increasing labor cost, and the error rate is also higher.

Therefore, the embodiment of the application provides a code generation method, which is to pre-establish an abstract syntax tree of a fusible network layer, extract the fusible network layer from the neural network when the neural network is required to be deployed to a back-end hardware device, combine a plurality of operations contained in each fusible network layer to obtain a fusion node, and then automatically generate a quantization realization code based on the pre-established abstract syntax tree and the type of the back-end hardware.

The code generation method provided by the embodiment of the application can be applied to any device with corresponding processing capability, such as: personal computers, notebook computers, smart phones, tablet computers, internet of things devices or servers, and the like, which are not limited in this embodiment. The above code generation method is described below with reference to specific embodiments.

In one embodiment, a code generation method is provided, see fig. 2, comprising:

s102, extracting at least two fusible network layers from the neural network.

Alternatively, the neural network in the embodiments of the present application may be a deep neural network (Deep Neural Networks, abbreviated as DNN), a recurrent neural network (Recurrent Neural Network, abbreviated as RNN), or a convolutional neural network (Convolutional Neural Network, abbreviated as CNN). The embodiment of the application does not limit the type of the neural network.

Optionally, the fusible network layer in the embodiment of the present application refers to: if multiple operations contained in one network layer can be combined with multiple operations contained in another network layer, both network layers are fusible network layers.

Alternatively, common fusible network layers are: the Eltwise layer and the activation layer can traverse network layers contained in the neural network, and if the Eltwise layer and the activation layer are contained in the neural network, the Eltwise layer and the activation layer can be used as the fusible network layers.

S104, acquiring a plurality of operations contained in each of at least two fusible network layers.

Optionally, the functions of each network layer in the neural network are different, and the operations included in each network layer are different. After the fusible network layer is extracted, the fusible network layer is analyzed to determine a plurality of operations contained in the fusible network layer, and if the fusible network layer is an Eltwise layer, the plurality of operations contained in the Eltwise layer can be determined as follows: dot product, add subtraction sum and take maximum max; if the fusible network layer is an active layer, a plurality of operations included in the active layer may be determined as: tanh, sigmoid, relu and pre.

S106, combining a plurality of operations contained in each of the at least two fusible network layers to obtain a fusion node.

Optionally, after determining the operations included in each of the fusible network layers, the operations included in each of the fusible network layers are combined, so as to obtain a plurality of fusion nodes. By way of example, when a plurality of operations included in the Eltwise layer and a plurality of operations included in the activation layer are combined, 12 fusion nodes can be obtained, and the operation combinations corresponding to the 12 fusion nodes are respectively: a combination of point product and tanh, a combination of point product and sigmoid, a combination of point product and inlu, a combination of addition reduction sum and tanh, a combination of addition reduction sum and sigmoid, a combination of addition reduction sum and inlu, a combination of maximum max and tanh, a combination of maximum max and sigmoid, a combination of maximum max and inlu, and a combination of maximum max and inlu.

S108, obtaining the quantization realization codes of the fusion nodes according to the operation combinations corresponding to the fusion nodes, the abstract syntax tree of each fusion network layer and the types of the back-end hardware.

Optionally, in an offline development stage, for each fusible network layer, an abstract syntax of the fusible network layer is established, when a neural network needs to be deployed to the back-end hardware, for each generated fusion node, based on an operation combination corresponding to the node, an abstract syntax tree of related operations is extracted from the abstract syntax tree of the fusible network layer, the extracted abstract syntax tree is parsed to obtain codes of the fusion node, and the codes are converted into a form supported by the back-end hardware, so that quantized implementation codes can be obtained.

The following is illustrative:

as shown in fig. 3, the neural network includes an Eltwise layer and an activation layer, and if a plurality of operations included in the Eltwise layer and a plurality of operations included in the activation layer are combined, 12 fusion nodes can be obtained, and fig. 3 shows one of the fusion nodes, where the operation combination corresponding to the fusion nodes is: dot product and relu and combinations. The fusion node is marked as Plugin, and a quantized implementation code of the Plugin can be generated in a mode of the embodiment of the application.

The code generation method provided by the embodiment of the application extracts at least two fusible network layers from the neural network; acquiring a plurality of operations contained in each of the at least two fusible network layers; combining a plurality of operations contained in each of the at least two fusible network layers to obtain a fusion node; and acquiring the quantization realization codes of the fusion nodes according to the operation combination corresponding to the fusion nodes, the abstract syntax tree of each fusion network layer and the type of the back-end hardware. The method replaces manual addition by an automatic generation mode, can reduce the labor cost and has lower error rate.

It was mentioned above that to generate the quantized implementation code using the abstract syntax tree of the fusible network layer, in one embodiment, referring to fig. 4, before using the abstract syntax tree of the fusible network layer, the abstract syntax tree of the fusible network layer may be obtained by:

s107, for each of at least two fusible network layers, acquiring codes of each of a plurality of operations contained in the fusible network layer, and disassembling the codes to obtain abstract syntax trees corresponding to the operations, wherein the abstract syntax trees corresponding to the plurality of operations contained in the fusible network layer form the abstract syntax tree of the fusible network layer.

The following is illustrative:

assuming that the fusible network layer is an Eltwise layer, the operations involved in the Eltwise layer are: taking the addition subtraction sum as an example, the pseudo code of the operation of the addition subtraction sum is as follows:

for(i＝0；i<size；i++){

vector_c[i]＝vector_a[i]+vector_b[i]；

}。

the code can be broken down into: forExprAST (for loop expression), assignExprAST (assignment operation), subsriptExprAST (variable index, such as vector_a [ i ]), binaryExprAST (variable addition), and the like. The ForExprAST, assignExprAST, subscriptExprAST and BinaryExprAST are called abstract syntax trees corresponding to the additive and subtractive sum. By adopting the same method, the abstract syntax tree corresponding to the point product and the maximum value max can be obtained, and the abstract syntax tree corresponding to the point product, the abstract syntax tree corresponding to the addition subtraction sum and the abstract syntax tree corresponding to the maximum value max form the abstract syntax tree of the Eltwise layer. Similarly, an abstract syntax tree of the activation layer may be obtained.

After the abstract syntax tree of each fusible network layer is obtained in the offline development stage, the quantization realization code can be automatically generated based on the substitution mode of the abstract syntax tree. In one embodiment, referring to fig. 5, the quantization-implementation code may be generated by:

s501, according to the operation combination corresponding to the fusion node, the abstract syntax tree of the operation contained in the operation combination is obtained from the abstract syntax tree of each fusion network layer.

Optionally, as described above, the abstract syntax tree of the fusible network layer is formed by an abstract syntax tree corresponding to the operation contained in the fusible network layer. Therefore, after the fusion node is obtained, it can be determined which combination of operations the fusion node is, and then the abstract syntax tree corresponding to the related operation is searched in the abstract syntax tree of each fusion network layer.

The following is illustrative:

as described above, when the operations included in the Eltwise layer and the operations included in the activation layer are combined, 12 fusion nodes can be obtained, and the operation combinations corresponding to the 12 fusion nodes are respectively: a combination of point product and tanh, a combination of point product and sigmoid, a combination of point product and inlu, a combination of addition reduction sum and tanh, a combination of addition reduction sum and sigmoid, a combination of addition reduction sum and inlu, a combination of maximum max and tanh, a combination of maximum max and sigmoid, a combination of maximum max and inlu, and a combination of maximum max and inlu. Taking the combination of the product and the relu as an example, the abstract syntax tree of the operation of the product by the point can be searched in the abstract syntax tree of the Eltwise layer, and the abstract syntax tree of the operation of the relu can be searched in the abstract syntax tree of the activation layer.

S502, analyzing an abstract syntax tree of the operation contained in the operation combination to obtain a code corresponding to the operation combination.

Alternatively, the process parsed herein may be the reverse of the above-described disassembly process, and it is assumed that the operation combination includes the following operations: the abstract syntax trees corresponding to the point product and the relu can be analyzed respectively by the point product and the relu to obtain codes corresponding to the operation combination.

S503, obtaining the quantization realization code of the fusion node according to the code corresponding to the operation combination and the type of the back-end hardware.

Optionally, the type of backend hardware is one of the following types: a central processing unit CPU, a graphics processing unit GPU or a data processing unit DPU.

Alternatively, the code corresponding to the operation combination can be converted into a form adapted to the type of the back-end hardware, and the conversion result is used as the quantization realization code of the fusion node.

The code generation method provided by the embodiment of the application provides a detailed process of how to automatically generate the quantized implementation code based on the abstract syntax tree, and compared with a mode of manually adding the code in the prior art, the code generation method improves the adding efficiency, reduces the labor cost and reduces the error rate.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiments of the present application also provide a code generation apparatus for implementing the above-mentioned related code generation method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more code generating device embodiments provided below may refer to the limitation of the code generating method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 6, there is provided a code generating apparatus including:

an extracting module 601, configured to extract at least two fusible network layers from the neural network;

an obtaining module 602, configured to obtain a plurality of operations included in each of the at least two fusible network layers;

a fusion module 603, configured to combine a plurality of operations included in each of the at least two fusible network layers to obtain a fusion node;

and the generating module 604 is configured to obtain a quantized implementation code of the fusion node according to the operation combination corresponding to the fusion node, the abstract syntax tree of each fusible network layer, and the type of the back-end hardware.

Optionally, the obtaining module 602 is further configured to:

Optionally, the generating module 604 is specifically configured to:

Optionally, the at least two fusible network layers include: an Eltwise layer and an activation layer; the Eltwise layer comprises a plurality of operations of: dot product, add subtraction sum and take maximum max; the activation layer includes a plurality of operations that are: tanh, sigmoid, relu and pre; the operation combination corresponding to the fusion node is one of the following combinations: a combination of point product and tanh, a combination of point product and sigmoid, a combination of point product and inlu, a combination of addition reduction sum and tanh, a combination of addition reduction sum and sigmoid, a combination of addition reduction sum and inlu, a combination of maximum max and tanh, a combination of maximum max and sigmoid, a combination of maximum max and inlu, or a combination of maximum max and inlu.

Optionally, the type of the backend hardware is one of the following types: a central processing unit CPU, a graphics processing unit GPU or a data processing unit DPU.

Each of the modules in the code generating apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as abstract syntax trees. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a code generation method.

It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

extracting at least two fusible network layers from the neural network;

In one embodiment, the processor when executing the computer program further performs the steps of:

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

extracting at least two fusible network layers from the neural network;

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

extracting at least two fusible network layers from the neural network;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A code generation method, the method comprising:

extracting at least two fusible network layers from the neural network;

2. The method according to claim 1, wherein before the obtaining the quantized implementation code of the fusion node according to the operation combination corresponding to the fusion node, the abstract syntax tree of each fusible network layer, and the type of the backend hardware, the method further comprises:

3. The method according to claim 1, wherein the obtaining the quantized implementation code of the fusion node according to the operation combination corresponding to the fusion node, the abstract syntax tree of each fusible network layer, and the type of the backend hardware includes:

4. The method according to claim 3, wherein the obtaining the quantized implementation code of the fusion node according to the code corresponding to the operation combination and the type of the backend hardware includes:

5. The method of any of claims 1-4, wherein the at least two fusible network layers comprise: an Eltwise layer and an activation layer; the Eltwise layer comprises a plurality of operations of: dot product, add subtraction sum and take maximum max; the activation layer includes a plurality of operations that are: tanh, sigmoid, relu and pre; the operation combination corresponding to the fusion node is one of the following combinations: a combination of point product and tanh, a combination of point product and sigmoid, a combination of point product and inlu, a combination of addition reduction sum and tanh, a combination of addition reduction sum and sigmoid, a combination of addition reduction sum and inlu, a combination of maximum max and tanh, a combination of maximum max and sigmoid, a combination of maximum max and inlu, or a combination of maximum max and inlu.

6. The method of claim 5, wherein the type of backend hardware is one of the following types: a central processing unit CPU, a graphics processing unit GPU or a data processing unit DPU.

7. A code generating apparatus, the apparatus comprising:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.