CN117313802A - Neural network model conversion method, device, equipment and storage medium - Google Patents
Neural network model conversion method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117313802A CN117313802A CN202311271323.XA CN202311271323A CN117313802A CN 117313802 A CN117313802 A CN 117313802A CN 202311271323 A CN202311271323 A CN 202311271323A CN 117313802 A CN117313802 A CN 117313802A
- Authority
- CN
- China
- Prior art keywords
- operator
- model
- conversion
- fusion
- operators
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 92
- 238000003062 neural network model Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000003860 storage Methods 0.000 title claims abstract description 13
- 230000004927 fusion Effects 0.000 claims abstract description 79
- 238000004364 calculation method Methods 0.000 claims abstract description 31
- 230000008030 elimination Effects 0.000 claims description 9
- 238000003379 elimination reaction Methods 0.000 claims description 9
- 238000011065 in-situ storage Methods 0.000 claims description 8
- 238000012546 transfer Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000013138 pruning Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses a neural network model conversion method, which relates to the field of neural networks, and comprises the steps of firstly constructing a conversion model and importing data such as network parameters extracted from an original network model; reading model operators of the conversion model layer by layer, matching the determined operator arrangement sequence with an operator fusion list, and determining a target fusion operator; fusing and converting the combination rules into intermediate operators; and finally, generating a calculation graph of the conversion model, marking dead codes according to the dependency relationship among all nodes in the calculation graph and the node parameter values, and eliminating the dead codes to obtain the target conversion model. The scheme can greatly reduce the size and calculation requirement of the model, thereby saving the storage space and reducing the hardware requirement. Due to the simplification of the model, the running efficiency and the reasoning speed of the model can be improved, and the cost for deploying the model is reduced.
Description
Technical Field
The present disclosure relates to the field of neural networks, and in particular, to a method, an apparatus, a device, and a storage medium for converting a neural network model.
Background
With the rapid development of artificial intelligence and machine learning, neural network models are widely used in various fields. The design and training of neural network models is critical to achieving high performance and high accuracy artificial intelligence applications.
However, in the related art, most neural network models have the problems of huge model, excessive calculation amount, low operation efficiency and the like, which limits the application of the neural network models in the resource-limited environment because the large model has higher requirements on hardware performance. Under special conditions, the model volume can be properly reduced by pruning, and the technical effect of reducing the calculated amount can be achieved, but the operation can be obtained at the expense of model precision. Moreover, the model pruning method operates after model training is completed, and the cost investment of model training and reasoning processes cannot be fundamentally reduced.
Disclosure of Invention
The application provides a neural network model conversion method, device, equipment and storage medium, which solve the problem that the neural network model adopts pruning compression to reduce the volume and the calculated amount and simultaneously cause excessive reduction of model precision in the traditional scheme.
In one aspect, the present application provides a neural network model conversion method, the method including:
constructing a conversion model, and importing network parameters, model operators, weight information, operators and operands extracted from the original network model;
reading model operators of the conversion model layer by layer, and carrying out one-to-one matching on the determined operator arrangement sequence and an operator fusion list to determine a target fusion operator; the operator fusion list contains all combination rules meeting operator fusion;
the codes of the target fusion operator are fused and converted according to the corresponding operator combination rule and the arrangement sequence, and are integrated into an intermediate operator; redefining output and output of the intermediate operators according to the operator arrangement sequence, and storing intermediate results between the original two adjacent operators in situ;
and generating a calculation graph of the conversion model, marking the dependency relationship among all nodes and the node parameter value in the calculation graph as dead codes, and folding and eliminating node paths of the dead codes to obtain the target conversion model.
Specifically, the magic number of the conversion model is set to 0X600F0F, and a model format is determined;
the importing network parameters, model operators, weight information, operators and operands extracted from the original network model includes:
determining weight information of each network parameter in the original network model, converting the weight information into a numpy type, and independently storing the weight information through a bin binary file;
network parameters, bin binaries, operators, and operands are imported into the conversion model.
Specifically, the operator fusion list includes the following combination rules:
conv1D operator+the batch norm1D operator;
conv2D operator+BatchNorm2d operator;
linear operator+batch norm1d operator;
a multiteadattention operator;
pad operator+conv1d operator;
pad operator+conv2d operator;
an Adjacent operator+a Reshape operator;
convtranspose1d operator+Batcnorm 1d operator;
select operator+Unbind operator;
and when the operator and the adjacent arrangement sequence contained in the conversion model are matched with the operator combination rule, determining the operator as the target fusion operator.
Specifically, the integrating the codes of the target fusion operator into an intermediate operator according to the combination rule and the arrangement sequence of the corresponding operators includes:
when a single operator rule is hit, directly carrying out rewriting conversion on the single operator rule; when the double operator rule is hit, sequentially defining a front operator and a rear operator according to the operator sequence of the combination rule, wherein the front operator is fusion input, and the rear operator is fusion output;
and carrying out combined rewriting according to the sequence of the front operator and the rear operator to obtain the intermediate operator, and storing the intermediate result of the combined front operator in situ.
Specifically, the rewrite transformation or the combined rewrite of the target fusion operator includes:
reading an operand and an operation symbol of each network layer in the conversion model, determining the type, the dimension and the input/output size of each operand, and re-writing according to the type, the dimension and the input/output size of the operand to obtain the intermediate operator;
the fused intermediate operator is as follows:
the Conv1D operator+the batch norm1D operator generates a conv1d_batch norm1D intermediate operator;
the Conv2D operator+BatchNorm2d operator generates a conv2d_Batchnorm2D intermediate operator;
the Linear operator and the batch norm1d operator generate a liner_batch norm1d intermediate operator;
generating an optimal_attribute intermediate operator by using a multi-head attribute operator;
generating a pad_conv1d intermediate operator by using the Pad operator and the conv1D operator;
generating a pad_conv2d intermediate operator by using the Pad operator and the Conv2D operator;
generating an adjacent_reshape intermediate operator by using the Adjacent operator and the Reshape operator;
the convtranspose1d operator+the batch norm1d operator generates a convtranspose1d_batch norm1d intermediate operator;
the Select operator+Unbind operator generates a select_unit intermediate operator.
Specifically, the marking the dead code according to the dependency relationship and the node parameter value between the nodes in the calculation graph includes:
determining the data transfer dependency relationship between each node path and the nodes of the model through the calculation graph;
when the front node output in the adjacent node does not generate a data influence relation to the rear node input, marking codes of the target node and the path as dead code elimination;
when a constant propagation node exists in a node path, folding and eliminating the constant propagation node.
Specifically, when the constant propagation node exists in the node path, the transferred constant data is moved into a public container for temporary storage; in the final stage of model reasoning, constant data are extracted from the common container and corresponding calculation operations are performed.
In another aspect, the present application provides a neural network model conversion apparatus, including:
the model creation module is used for constructing a conversion model and importing network parameters, model operators, weight information, operators and operands extracted from the original network model;
the operator determining module is used for reading the model operators of the conversion model layer by layer, and carrying out one-to-one matching on the determined operator arrangement sequence and the operator fusion list to determine a target fusion operator; the operator fusion list contains all combination rules meeting operator fusion;
the operator fusion module is used for carrying out fusion conversion on codes of the target fusion operators according to the corresponding operator combination rules and the arrangement sequence, and integrating the codes into an intermediate operator; redefining output and output of the intermediate operators according to the operator arrangement sequence, and storing intermediate results between the original two adjacent operators in situ;
and the dead code elimination module is used for generating a calculation graph of the conversion model, marking the dependency relationship among all nodes and the node parameter value in the calculation graph as dead codes, and folding and eliminating node paths of the dead codes to obtain the target conversion model.
In yet another aspect, the present application provides a computer device, wherein the computer device includes a processor and a memory, where at least one instruction, at least one program, a code set, or an instruction set is stored, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the neural network model conversion method described in any one of the above aspects.
In yet another aspect, the present application provides a computer readable storage medium, where at least one instruction, at least one program, a code set, or an instruction set is stored in the readable storage medium, where the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the neural network model conversion method according to any one of the above aspects.
The beneficial effects that technical scheme that this application embodiment provided include at least: the original neural network model is converted into a brand new model structure, meanwhile, data such as network parameters, weight information and the like are migrated into the conversion model, a complete conversion model is obtained, model operators in the conversion model are identified subsequently, then a target fusion operator capable of carrying out operator fusion is determined according to specific fusion rules, fusion operation is carried out, the number of operators in the conversion model and repeated data movement and data transmission can be greatly reduced, operation steps are reduced from a hardware angle, and model reasoning efficiency is improved. And based on node path analysis of the computational graph, the data transmission and the dependency relationship among the model nodes can be determined, the propagation paths of the nodes which have no influence on the output and the constant nodes are determined, the corresponding nodes are folded and eliminated, unnecessary codes and calculation content are reduced, and the model volume is reduced.
Drawings
Fig. 1 is a flowchart of a neural network model conversion method provided in an embodiment of the present application;
FIG. 2 is a block diagram of a framework for model conversion provided by an embodiment of the present application;
FIG. 3 lists a calculation of the attention model in one possible Pytorch;
FIG. 4 is a simplified computational diagram of the MultiheadAttention model after dead code elimination from the computational diagram;
FIG. 5 shows a network computation graph and code content for a convolutional CNN operation and a BatchNorm1d operation written using Pytorch;
fig. 6 is a block diagram of a neural network model conversion device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
References herein to "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The design concept of the application is derived from feature extraction and migration learning in the neural network. By delving into the various neural network structures and their behavior in different tasks, some shared features and patterns have been discovered therefrom. Based on these findings, the present application proposes a simple and efficient method that can convert a complex neural network model into a simpler, more compact model, while still maintaining high accuracy and energy.
Fig. 1 is a flowchart of a neural network model conversion method provided in an embodiment of the present application, including the following steps:
step 101, constructing a conversion model, and importing network parameters, model operators, weight information, operators and operands extracted from an original network model.
The existing neural network model can be in any architecture format set by a programmer, such as Pytorch, paddlepaddle and Tensorflow. The conversion of the application is to redesign a framework structure and convert the framework structure into a file format with a specific suffix. For example, the rar file suffix represents a compressed package and the docx file represents a word document. In the existing computer operating system, different types of files have corresponding magic numbers for the computer to perform identification, differentiation and individualization operations. The application defines that the magic number of the conversion model is 0X600F0F, and the decimal is 6295311, and the magic number is used as a standard of computer identification type. Assuming that the frame data format suffix is defined as SSNX, the name of the original neural network model is mobileface.onnx, the converted is denoted as mobileface.ssnx.
After the conversion model is built, all data of the original network model needs to be imported into the built conversion model, and the process needs to extract network parameters, model operators, weight information, operators, operands and the like contained in each network layer from the original model and migrate the network parameters, model operators, weight information, operators, operands and the like to the new SSNX.
The importing involves weight information data migration, and in order to ensure data accuracy and later data editing, the application selects to determine weight information of each network parameter in an original network model, converts the weight information into a numpy type (Python language type) and stores the weight information independently through a bin binary file. And when the network parameters, the bin binary files, the operators and the operands are imported into the conversion model. The process centralizes the weight information, and if the subsequent return debugging is needed, the extraction and modification are convenient, and the rest data are normally migrated.
Step 102, reading model operators of the conversion model layer by layer, and carrying out one-to-one matching on the determined operator arrangement sequence and the operator fusion list to determine a target fusion operator.
The deep learning model is composed of several computation units called model operators, for example, the convolution layer is an operator, and the weight and summation process in the full-join layer is a model operator. A network model contains a plurality of model operators which are independently and separately written by programmers for facilitating debugging and defining functions. The independent writing is very convenient when the network model is debugged, but in the reasoning stage, a large amount of data can be subjected to copy and movement operations among network layers, and various copy, movement operations are primary factors which lead to the slow processing performance and response speed of the model. The method analyzes the newly constructed conversion model based on the analysis, sequentially arranges model operators contained in the conversion model, searches target fusion operators conforming to operator conditions from the model operators, and then carries out fusion conversion on the target fusion operators.
The core idea of operator fusion is to combine multiple operators into one, so that intermediate results do not need to be written back to the global memory, the distribution of intermediate variables is reduced, and the performance is improved. In addition, only one kernel needs to be called after merging, and the time for calling a plurality of kernels can be reduced. The application provides an operator fusion list, wherein all model operator rules conforming to fusion conditions are recorded in the list, and the model operator rules comprise the composition form and the connection sequence of model operators.
In one possible implementation, for example, the Convolition operator+ReLU operator. If the calculation result of the Convolition operator needs to be written back into the CPU or the GPU according to the normal logic and flow, the ReLU operator is read from the CPU or the GPU and then calculated. However, in practice, the ReLU operator itself is less computationally intensive, so performance bottlenecks are now incurred in the GPU's read and write. If operator fusion is performed, the ReLU operator is directly replaced after each calculation of Convolition, so that the reading and writing time of intermediate variables is reduced, and the performance is improved.
Based on the logic thought, when the target fusion algorithm is matched, all operator arrangement sequences contained in the target fusion algorithm are read according to the conversion model, then the target fusion algorithm is matched with the operator fusion list, and only when the corresponding model operator types and the front-back arrangement meet the rules, the target fusion algorithm can be determined.
In one possible implementation, the operator fusion list includes the following combination rules:
a) Conv1D operator+the batch norm1D operator;
b) Conv2D operator+BatchNorm2d operator;
c) Linear operator+batch norm1d operator;
d) A multiteadattention operator;
e) Pad operator+conv1d operator;
f) Pad operator+conv2d operator;
g) An Adjacent operator+a Reshape operator;
h) Convtranspose1d operator+Batcnorm 1d operator;
i) Select operator+Unbind operator;
note that the above-described combination rule strictly specifies the order of the model operators, e.g., in rule a, the patch norm1D operator must be located behind the adjacent Conv1D operator to meet the operator fusion condition. When the system is matched, a previous operator Conv1D is firstly identified, then a rule a is determined, whether an operator behind Conv1D in the model is a batch norm1D operator or not is further searched according to a batch norm1D operator in the rule a, and a target fusion operator is determined based on the fact that the operator behind Conv1D is the batch norm1D operator. Other rules are based on this determination.
And 103, fusing and converting codes of the target fusion operator according to the corresponding operator combination rule and the arrangement sequence, and integrating the codes into an intermediate operator.
It should be noted that, the rule d in the operator fusion list is a single operator, that is, only if the multi-head attention operator exists in the model, the multi-head attention operator can be determined as the target fusion operator, and further conversion is performed. The substantial reason that a single operator can be converted is that the model operator is a complex operator, the multiteadattention is a multi-head attention operator, the inside of the multi-head attention operator contains multiple refined algorithm logic, and the multi-head attention operator can naturally perform self fusion conversion; other double operators, even multiple operators, are fusion between model operators.
In the fusion conversion step, when a single operator rule is hit, directly carrying out rewriting conversion on the single operator rule; when the double operator rule is hit, a front operator and a rear operator are sequentially defined according to the operator sequence of the combination rule, the front operator is fusion input, the rear operator is fusion output, namely, combination rewriting is carried out according to the operator sequence of the front operator and the rear operator, an intermediate operator is obtained, and an intermediate result of the combined front operator is stored in situ.
For example, two operators are conv2d+Batchnorm2D, firstly, conv2D is searched, then, the matching is carried out on the two operators, namely, the two operators are found to match with the Batchnorm2D, so that the two operators can be matched with the rule b, and therefore, the two operators are rewritten into one SNNX operator which is conv2d_Batchnorm2D, input is Conv2D, output is the result of Batchnorm2D, and the intermediate result is stored in-place operation, so that the secondary dump reading step of intermediate data is omitted.
In connection with the combining rule in step 102, the intermediate operator after conversion is expressed as follows:
the Conv1D operator+the batch norm1D operator generates a conv1d_batch norm1D intermediate operator;
the Conv2D operator+BatchNorm2d operator generates a conv2d_Batchnorm2D intermediate operator;
the Linear operator and the batch norm1d operator generate a liner_batch norm1d intermediate operator;
generating an optimal_attribute intermediate operator by using a multi-head attribute operator;
generating a pad_conv1d intermediate operator by using the Pad operator and the conv1D operator;
generating a pad_conv2d intermediate operator by using the Pad operator and the Conv2D operator;
generating an adjacent_reshape intermediate operator by using the Adjacent operator and the Reshape operator;
the convtranspose1d operator+the batch norm1d operator generates a convtranspose1d_batch norm1d intermediate operator;
the Select operator+Unbind operator generates a select_unit intermediate operator.
The process can greatly simplify the model operator of the large-scale network model, thereby effectively improving the model reasoning rate.
And 104, generating a calculation graph of the conversion model, marking the dependency relationship among all nodes and the node parameter value in the calculation graph as dead codes, and folding and eliminating node paths of the dead codes to obtain the target conversion model.
The fusion of model operators only improves the reasoning and running efficiency, does not reduce the model volume per se, and although redefined model file types have no great change in file volume after data is imported and original models, the model volume needs to be properly reduced from the code quantity to obtain a final SNNX model, and the conversion framework can convert network models in any format into a target format (SNNX) defined by application, see the framework processing logic and sequence in FIG. 2.
According to the scheme, the node path condition of the data is analyzed through the computational graph, a part which is not important to the neural network is found, and the part of codes are marked as dead codes to be eliminated, so that the model volume is reduced, and the model precision is not changed. The method specifically comprises the following steps:
a, determining a data transfer dependency relationship between each node path and nodes of a model through a calculation graph;
computational graphs are important tools for analyzing network models, the hierarchical architecture and nodes of the models are learned through generating the computational graphs, and the dependency relationship of data transfer among the nodes of the models can be known through the connection relationship (path) among the nodes. A computational graph of the intent model in Pytorch is listed as fig. 3, which includes a large number of computation nodes and constituent paths.
B, when the front node output in the adjacent node does not generate a data influence relation to the rear node input, marking codes of the target node and the path as dead code elimination;
the computation graph is an important tool for performing hierarchical analysis by a programmer, and the data transfer dependency relationship among all nodes can be directly determined through the computation graph, for example, a=4, b=5, and y=a×2, wherein y is found to be related to a and is irrelevant to b, and in the computation graph, b occupies one node independently, so b belongs to useless nodes. For such nodes and component paths, dead codes can be marked, even if the nodes and component paths are folded, the precision of the model output result is not affected, and the node paths can be automatically skipped when data of the nodes are propagated forward to the nodes, so that unnecessary calculation is reduced.
And C, when a constant propagation node exists in the node path, folding and eliminating the constant propagation node.
The constant propagation nodes represent that the input data is constant, and the input data can be directly output by each node only participating in simple four-rule operation or not participating in the calculation of the network layer, for example, the constant represents that the gain is increased or reduced. This data transfer can be moved directly into the common container for temporary storage, i.e. to store data temporarily, and in the final stage of model reasoning, constant data are extracted from the common container and corresponding calculation operations are performed. This step folds or deletes the node, reducing constant intermediate data transfers and unnecessary computation. The model obtained after eliminating the dead codes is a simplified target conversion model, and the target conversion model is simplified from two angles of operator fusion and dead code elimination, compared with the traditional pruning of the network layer neurons, the model volume is not excessively reduced due to the fact that the model precision is not excessively sacrificed, the model operation efficiency is greatly improved, and the model can fully play a role in embedded equipment.
FIG. 4 is a simplified computational diagram of the Multihead addition model after dead code elimination according to the computational diagram, wherein a large amount of intermediate data transfer and node content are eliminated, all operations operate in the same block of memory space, no excess memory space is consumed, and thus the computation time of the neural network can be greatly reduced.
In summary, the method converts the original neural network model into a brand new model structure by setting the magic numbers, and meanwhile, data such as network parameters, weight information and the like are migrated into the conversion model to obtain a complete conversion model, model operators in the conversion model are subsequently identified, and then a target fusion operator capable of carrying out operator fusion is determined according to specific fusion rules, fusion operation is carried out, so that the number of operators in the conversion model and repeated data movement and data transmission can be greatly reduced, operation steps are reduced from a hardware angle, and model reasoning efficiency is improved. And based on node path analysis of the computational graph, the data transmission and the dependency relationship among the model nodes can be determined, the propagation paths of the nodes which have no influence on the output and the constant nodes are determined, the corresponding nodes are folded and eliminated, unnecessary codes and calculation content are reduced, and the model volume is reduced.
In practical application, the scheme is run on Intel Core i7-7700K,64G memory, ubuntu 22.04, see FIG. 5 for a network computational graph and code content of convolutional CNN operation and BatchNorm1d operation written by Pytorch, that is, the performance of the CNN+BatchNorm model after conversion to SNNX expression is improved by 32% compared with the native Pytorch model, mainly due to the reduced copy transmission of the internal weights of the model. In the Attention model, the performance of the model after using SNNX expression is improved by 18%, and in consideration of the fact that a large amount of time in Attention is consumed in matrix calculation, not in frequent transmission of weights, there is still some improvement in using SNNX.
Overall, it can greatly reduce the size and computational requirements of the model, thereby saving memory space and reducing hardware requirements. Secondly, due to the simplification of the model, the operation efficiency and the reasoning speed of the model can be improved, so that the model is more suitable for embedded equipment and an edge computing environment. In addition, our method also reduces the cost of training and deploying models, enabling more developers to benefit from the application of neural network models.
Specifically, the framework of the scheme is a framework for converting the neural network model into a specific Python language, and does not define a new operator intermediate expression. All parameters inside after conversion have the same name as the original python API, so that the advantage is that when we take the converted model, we can also switch back to the original model code through the converted model, and the operator definition and the parameter expression form are the same, so that the operator definition and the parameter expression form can be switched back, and the debugging and development of a developer are facilitated. These are benefits of simply setting the model weight information as a binary file.
Fig. 6 is a block diagram of a neural network model conversion device according to an embodiment of the present application, where the device includes:
a model creation module 610 for constructing a transformation model, importing network parameters, model operators, weight information, operators, and operands extracted from the original network model;
an operator determining module 620, configured to read the model operators of the conversion model layer by layer, match the determined operator arrangement sequence with the operator fusion list one by one, and determine a target fusion operator; the operator fusion list contains all combination rules meeting operator fusion;
the operator fusion module 630 is configured to fusion-convert the code of the target fusion operator according to the corresponding operator combination rule and the arrangement sequence, and integrate the code into an intermediate operator; redefining output and output of the intermediate operators according to the operator arrangement sequence, and storing intermediate results between the original two adjacent operators in situ;
and the dead code elimination module 640 is used for generating a calculation graph of the conversion model, marking the dependency relationship among the nodes and the node parameter value in the calculation graph as dead codes, and folding and eliminating the node paths to obtain the target conversion model.
In addition, the application further provides a computer device, which includes a processor and a memory, where at least one instruction, at least one section of program, a code set, or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set, or the instruction set is loaded and executed by the processor to implement the neural network model conversion method described in any of the foregoing embodiments.
In addition, the application further provides a computer readable storage medium, which is characterized in that at least one instruction, at least one section of program, a code set or an instruction set is stored in the readable storage medium, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by a processor to implement the neural network model conversion method described in any embodiment.
The foregoing describes preferred embodiments of the present invention; it is to be understood that the invention is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art will make many possible variations and modifications, or adaptations to equivalent embodiments without departing from the technical solution of the present invention, which do not affect the essential content of the present invention; therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.
Claims (10)
1. A neural network model conversion method, the method comprising:
constructing a conversion model, and importing network parameters, model operators, weight information, operators and operands extracted from the original network model;
reading model operators of the conversion model layer by layer, and carrying out one-to-one matching on the determined operator arrangement sequence and an operator fusion list to determine a target fusion operator; the operator fusion list contains all combination rules meeting operator fusion;
the codes of the target fusion operator are fused and converted according to the corresponding operator combination rule and the arrangement sequence, and are integrated into an intermediate operator; redefining output and output of the intermediate operators according to the operator arrangement sequence, and storing intermediate results between the original two adjacent operators in situ;
and generating a calculation graph of the conversion model, marking the dependency relationship among all nodes and the node parameter value in the calculation graph as dead codes, and folding and eliminating node paths of the dead codes to obtain the target conversion model.
2. The neural network model conversion method according to claim 1, wherein the magic number of the conversion model is set to 0X600F0F, and a model format is determined;
the importing network parameters, model operators, weight information, operators and operands extracted from the original network model includes:
determining weight information of each network parameter in the original network model, converting the weight information into a numpy type, and independently storing the weight information through a bin binary file;
network parameters, bin binaries, operators, and operands are imported into the conversion model.
3. The neural network model conversion method according to claim 1, wherein the operator fusion list includes the following combination rules:
conv1D operator+the batch norm1D operator;
conv2D operator+BatchNorm2d operator;
linear operator+batch norm1d operator;
a multiteadattention operator;
pad operator+conv1d operator;
pad operator+conv2d operator;
an Adjacent operator+a Reshape operator;
convtranspose1d operator+Batcnorm 1d operator;
select operator+Unbind operator;
and when the operator and the adjacent arrangement sequence contained in the conversion model are matched with the operator combination rule, determining the operator as the target fusion operator.
4. The neural network model conversion method according to claim 3, wherein the integrating the codes of the target fusion operator into an intermediate operator according to the corresponding operator combination rule and the arrangement order comprises:
when a single operator rule is hit, directly carrying out rewriting conversion on the single operator rule; when the double operator rule is hit, sequentially defining a front operator and a rear operator according to the operator sequence of the combination rule, wherein the front operator is fusion input, and the rear operator is fusion output;
and carrying out combined rewriting according to the sequence of the front operator and the rear operator to obtain the intermediate operator, and storing the intermediate result of the combined front operator in situ.
5. The neural network model conversion method of claim 4, wherein the re-writing conversion or combined re-writing of the target fusion operator comprises:
reading an operand and an operation symbol of each network layer in the conversion model, determining the type, the dimension and the input/output size of each operand, and re-writing according to the type, the dimension and the input/output size of the operand to obtain the intermediate operator;
the fused intermediate operator is as follows:
the Conv1D operator+the batch norm1D operator generates a conv1d_batch norm1D intermediate operator;
the Conv2D operator+BatchNorm2d operator generates a conv2d_Batchnorm2D intermediate operator;
the Linear operator and the batch norm1d operator generate a liner_batch norm1d intermediate operator;
generating an optimal_attribute intermediate operator by using a multi-head attribute operator;
generating a pad_conv1d intermediate operator by using the Pad operator and the conv1D operator;
generating a pad_conv2d intermediate operator by using the Pad operator and the Conv2D operator;
generating an adjacent_reshape intermediate operator by using the Adjacent operator and the Reshape operator;
the convtranspose1d operator+the batch norm1d operator generates a convtranspose1d_batch norm1d intermediate operator;
the Select operator+Unbind operator generates a Select unit intermediate operator.
6. The neural network model conversion method according to claim 1, wherein the marking as dead codes according to the dependency relationship between the nodes and the node parameter values in the computation graph comprises:
determining the data transfer dependency relationship between each node path and the nodes of the model through the calculation graph;
when the front node output in the adjacent node does not generate a data influence relation to the rear node input, marking codes of the target node and the path as dead code elimination;
when a constant propagation node exists in a node path, folding and eliminating the constant propagation node.
7. The neural network model conversion method of claim 6, wherein when the constant propagation node exists in the node path, the transferred constant data is moved into a common container for temporary storage; in the final stage of model reasoning, constant data are extracted from the common container and corresponding calculation operations are performed.
8. A neural network model conversion device, the device comprising:
the model creation module is used for constructing a conversion model and importing network parameters, model operators, weight information, operators and operands extracted from the original network model;
the operator determining module is used for reading the model operators of the conversion model layer by layer, and carrying out one-to-one matching on the determined operator arrangement sequence and the operator fusion list to determine a target fusion operator; the operator fusion list contains all combination rules meeting operator fusion;
the operator fusion module is used for carrying out fusion conversion on codes of the target fusion operators according to the corresponding operator combination rules and the arrangement sequence, and integrating the codes into an intermediate operator; redefining output and output of the intermediate operators according to the operator arrangement sequence, and storing intermediate results between the original two adjacent operators in situ;
and the dead code elimination module is used for generating a calculation graph of the conversion model, marking the dependency relationship among all nodes and the node parameter value in the calculation graph as dead codes, and folding and eliminating node paths of the dead codes to obtain the target conversion model.
9. A computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set that is loaded and executed by the processor to implement the neural network model conversion method of any of claims 1 to 7.
10. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement the neural network model conversion method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311271323.XA CN117313802A (en) | 2023-09-28 | 2023-09-28 | Neural network model conversion method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311271323.XA CN117313802A (en) | 2023-09-28 | 2023-09-28 | Neural network model conversion method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117313802A true CN117313802A (en) | 2023-12-29 |
Family
ID=89296766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311271323.XA Pending CN117313802A (en) | 2023-09-28 | 2023-09-28 | Neural network model conversion method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117313802A (en) |
-
2023
- 2023-09-28 CN CN202311271323.XA patent/CN117313802A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huang et al. | Gamepad: A learning environment for theorem proving | |
EP4036803A1 (en) | Neural network model processing method and apparatus, computer device, and storage medium | |
US11803404B2 (en) | Deep learning algorithm compiling method, device, and related product | |
CN110766147B (en) | Neural network compiler architecture and compiling method | |
CN112579063B (en) | Acceleration method for exploring optimization space in deep learning compiler | |
CN110321999B (en) | Neural network computational graph optimization method | |
US6098059A (en) | Computer implemented machine learning method and system | |
Stachurski | Economic dynamics: theory and computation | |
WO2021190597A1 (en) | Processing method for neural network model, and related device | |
US5946674A (en) | Turing complete computer implemented machine learning method and system | |
US20240161474A1 (en) | Neural Network Inference Acceleration Method, Target Detection Method, Device, and Storage Medium | |
CN112199086A (en) | Automatic programming control system, method, device, electronic device and storage medium | |
WO2021000971A1 (en) | Method and device for generating operation data and related product | |
CN111104120A (en) | Neural network compiling method and system and corresponding heterogeneous computing platform | |
US8935657B2 (en) | Model-to-model transformation by kind | |
CN113609806B (en) | Quantum circuit program general transformation method combining sub-graph isomorphism | |
CN116523052B (en) | Rapid reasoning method, device and equipment | |
CN115904394B (en) | Neural network increment compiling method and device for many-core architecture | |
CN117591174A (en) | AVX2SVE code transplanting and optimizing method based on compiler expansion | |
CN115469860B (en) | Method and system for automatically generating demand-to-software field model based on instruction set | |
CN112527304B (en) | Self-adaptive node fusion compiling optimization method based on heterogeneous platform | |
CN117291260A (en) | Deep learning framework adaptation method, deep learning framework adaptation device, deep learning framework adaptation equipment, deep learning framework adaptation storage medium and deep learning framework adaptation product | |
CN115879450B (en) | Gradual text generation method, system, computer equipment and storage medium | |
CN117313802A (en) | Neural network model conversion method, device, equipment and storage medium | |
JP3318051B2 (en) | Translation processing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |