CN113128670A - Neural network model optimization method and device - Google Patents
Neural network model optimization method and device Download PDFInfo
- Publication number
- CN113128670A CN113128670A CN202110382904.5A CN202110382904A CN113128670A CN 113128670 A CN113128670 A CN 113128670A CN 202110382904 A CN202110382904 A CN 202110382904A CN 113128670 A CN113128670 A CN 113128670A
- Authority
- CN
- China
- Prior art keywords
- weight
- layer
- neural network
- network model
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003062 neural network model Methods 0.000 title claims abstract description 132
- 238000005457 optimization Methods 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000012545 processing Methods 0.000 claims abstract description 58
- 238000007499 fusion processing Methods 0.000 claims abstract description 24
- 238000013139 quantization Methods 0.000 claims abstract description 8
- 238000010606 normalization Methods 0.000 claims description 30
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000012827 research and development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Feedback Control In General (AREA)
Abstract
The method provided by the application obtains the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; performing algorithm layer fusion processing on the initial model to obtain an intermediate model; determining a weight adjustment coefficient range according to a weight range and a preset coefficient of a neural network model which can be accommodated by a target hardware accelerator; performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters; and carrying out quantization processing on the intermediate model after the weight optimization to obtain an optimized neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited.
Description
Technical Field
The application relates to the technical field of internet, in particular to a method and a device for optimizing a neural network model.
Background
The method for identifying the target object based on the neural network model has extremely high accuracy, so that the method is widely applied to various fields, such as automatic driving, intelligent security and protection, intelligent robots and the like. Accordingly, neural network models are also widely deployed in hardware accelerators compatible with various fields. The hardware accelerator is used for matching with a central processing unit where the hardware accelerator is located to process special work at a high speed. Wherein a hardware accelerator deploying the neural network model is used for exclusively running the neural network model. With the continuous widening of the application field of the neural network model, more and more research and development efforts are put into the neural network model. In order to reduce the research and development cost of the neural network model, a research and development person completes the work of building and training the neural network model on a general CPU, a GPU and other processors, and then the neural network model is adapted to a hardware accelerator in a specific application field.
The neural network model can realize accurate identification of the target object and is based on the calculation complexity of the model height. This requires a hardware accelerator for deploying neural network models that has significant memory and can carry complex computational complexity. However, in practical applications, hardware for deploying the neural network model does not have a very large memory space, which results in that the neural network model cannot be applied to a hardware accelerator with a small memory.
Disclosure of Invention
The application provides an optimization method and device of a neural network model, which can be used for solving the technical problem that the neural network model in the prior art cannot be adapted in hardware with smaller memory.
In a first aspect, the present application provides a method for optimizing a neural network model, the method including:
acquiring the structure of a neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
performing algorithm layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing;
determining a weight adjustment coefficient range according to a weight range and a preset coefficient of a neural network model which can be accommodated by a target hardware accelerator; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range;
performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameter;
and carrying out quantization processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
With reference to the first aspect, in an implementation manner of the first aspect, performing weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter includes:
according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.
With reference to the first aspect, in an implementation manner of the first aspect, performing weight optimization processing on the target operator layer according to the weight adjustment coefficient range and the weight parameter of the target operator layer includes:
determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
With reference to the first aspect, in an implementation manner of the first aspect, the performing weight optimization processing on the target operator layer further includes:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
With reference to the first aspect, in an implementation manner of the first aspect, performing operator layer fusion processing on the initial model to obtain an intermediate model includes:
fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model; the convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and a normalization layer is connected behind each convolution layer.
With reference to the first aspect, in an implementation manner of the first aspect, performing operator layer fusion processing on the initial model to obtain an intermediate model includes:
if the initial model comprises a plurality of branch convolutional layers with the same convolutional kernel, fusing the plurality of branch convolutional layers into one convolutional layer to obtain the intermediate model; the plurality of branch convolution layers are convolution layers, and all the branch convolution layers are mutually juxtaposed in structure.
In a second aspect, the present application provides an apparatus for optimizing a neural network model, the apparatus comprising:
the acquisition module is used for acquiring the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
the conversion module is used for converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
the processing module is used for carrying out computation layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range; according to the weight adjusting coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model; and carrying out quantitative processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:
according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.
With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:
determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
With reference to the second aspect, in an implementation manner of the second aspect, the apparatus further includes an adding module configured to:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:
fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model; the convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and a normalization layer is connected behind each convolution layer.
With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:
if the initial model comprises a plurality of branch convolutional layers with the same convolutional kernel, fusing the plurality of branch convolutional layers into one convolutional layer to obtain the intermediate model; the plurality of branch convolution layers are convolution layers, and all the branch convolution layers are mutually juxtaposed in structure.
According to the method, the operator layer is fused, the calculation dimensionality of the neural network model is reduced, and then the weight optimization processing is carried out on the neural network model according to the weight range which can be contained by the target hardware accelerator, so that the target hardware accelerator is adaptive to the neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
Drawings
Fig. 1 is a schematic flowchart of an optimization method of a neural network model according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an optimization apparatus of a neural network model according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of an optimization method of a neural network model according to an embodiment of the present disclosure. The method provided by the implementation of the application comprises the following steps:
step S101, obtaining the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized.
The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.
It should be noted that the neural network model to be optimized in the embodiment of the present application may be a classical model carried by an open-source framework, or a neural network model built by a user and trained. The neural network model comprises a plurality of operator layers, and each operator layer is a convolution layer or a normalization layer correspondingly. The convolutional layer provided in the embodiment of the present application may be one or more of main convolutional layers such as a normal convolutional layer, a deep convolutional layer, a void convolutional layer, and a packet convolutional layer. The embodiment of the present application does not particularly limit the size, step size, filling, number of channels, and the like of the convolutional layer.
In one implementation, the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized, and the weight parameters corresponding to each operator layer of the neural network model to be optimized can be obtained by traversing the whole neural network model to be optimized.
And S102, converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized.
The format corresponding to the initial model is an Open Neural Network Exchange (ONNX) format.
And step S103, performing algorithm layer fusion processing on the initial model to obtain an intermediate model.
The initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through operator layer fusion processing.
It should be noted that step S102 and step S103 are performed synchronously, that is, the method provided in the embodiment of the present application converts the initial model into the format of the open neural network exchange model while performing operator-layer fusion processing on the initial model.
According to different structures of the neural network model to be optimized, the implementation of the application provides a plurality of algorithm layer fusion processing methods. One such method includes fusing a normalization layer, which is connected to a convolutional layer, into the convolutional layer to obtain an intermediate model. The convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and one normalization layer is connected behind each convolution layer. There is no normalization layer after the new convolutional layer after fusion.
In another algorithm layer fusion processing method, if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, the plurality of branch convolution layers are fused into one convolution layer to obtain an intermediate model. The plurality of branch convolution layers are convolution layers, and all of the branch convolution layers are structurally juxtaposed to each other. Specifically, if the initial model has multiple branches from the input layer to the output layer, and the multiple branches have the same branching point and merging point, all branch convolution layers are considered to be structurally juxtaposed to each other. Further, it is determined whether the convolution kernels of all the branch convolution layers are the same, and if the convolution kernels of all the branch convolution layers are the same, for example, 3 × 3 or 1 × 1, the branch convolution layers are merged into one convolution layer.
The operator layer fusion processing method provided by the embodiment of the application further comprises the step of fusing a plurality of normalization layers into one normalization layer if the neural network model to be optimized has the plurality of continuous normalization layers.
According to the operator layer fusion processing method, the neural network model to be optimized is simplified in structure, and meanwhile, the accuracy of the neural network model to be optimized is maintained.
After step S103 is executed, the method provided in the embodiment of the present application further needs to perform equivalent transformation on the operator layer type in the initial model according to the operator layer type supported by the target hardware accelerator. For example, the hardware accelerator only supports 3 × 3 convolutional layers, and if the initial model includes both 3 × 3 convolutional layers and other types of convolutional layers, equivalent processing needs to be performed on the other types of convolutional layers in the initial model. For another example, if the hardware accelerator does not support hole convolution, zero is filled in the corresponding empty hole in the initial model, so that the hole convolution is equivalent to a normal convolution layer.
According to the method provided by the embodiment of the application, in order to facilitate the subsequent steps to directly obtain the information of the intermediate model, in the process of performing operator layer fusion processing and equivalent processing on the initial model, the information related to the operator layer, such as the input tensor information, the output tensor information, the parameter information and the like of each operator layer is added layer by layer.
And step S104, determining a weight adjusting coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient.
The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range.
In the method provided by the embodiment of the present application, the bit width of B bits is represented for the weight supported by the hardware accelerator, and the weight range of the neural network model that can be accommodated by the target hardware accelerator is determined by the following method:
A=[-2B-1,+(2B-1-1)]formula (1)
In formula (1), A is the weight range,p=2B-1-1, being the maximum value in the weight range, n ═ 2B-1And is the smallest value in the weight range.
The weight adjustment coefficient range is determined by the following method:
D=[n,n+(p-n)*Sth]formula (2)
In the formula (2), D is the weight adjustment coefficient range; n is-2B-1The smallest value in the weight range; p is 2B-1-1, being the maximum value in the weight range; sth=0.8。
And step S105, performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters.
In one implementation, according to the connection order of the computation layers, the weight optimization processing is performed on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer.
The target operator layer is any operator layer with the weight within the weight adjusting coefficient range in the intermediate model.
It should be noted that the method provided in the embodiment of the present application does not need to perform weight optimization on all the operator layers in the intermediate model, and only needs to perform weight optimization on a target operator layer whose weight is within the weight adjustment coefficient range. Therefore, according to the method provided by the embodiment of the application, whether the weight corresponding to the current operator layer is within the weight adjustment coefficient range is sequentially judged from the input layer to the output layer according to the connection sequence of the operator layers, and if the weight corresponding to the current operator layer is within the weight adjustment coefficient range, the current operator layer is taken as the target operator layer.
Fig. 2 is a schematic flow chart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application. The weight optimization processing method provided by the embodiment of the application comprises the following steps:
step S201 determines a weight adjustment coefficient of the target operator layer according to the maximum weight value of the target operator layer and the minimum weight value of the target operator layer.
In step S101, the weight parameter corresponding to each operator layer has been obtained, and the weight parameter corresponding to the operator layer is not changed through steps S101 to S104. And comparing the maximum weight value of each operator layer with the minimum weight value of each operator layer, wherein the absolute value between the maximum weight value and the minimum weight value is larger, and taking the larger absolute value as a weight adjustment coefficient of the target operator layer. And if the weight adjusting coefficient range corresponding to the operator layer is within the adjusting range, determining the operator layer as a target operator layer. Correspondingly, the maximum weight value of the operator layer is the maximum weight value of the target operator layer, and the minimum weight value of the operator layer is the minimum weight value of the target operator layer. The target operator layer satisfies the following conditions:
n≤Wmax≤n+(p-n)*Sthformula (3)
In the formula (3), n is the minimum value in the weight adjusting coefficient range; n + (p-n) SthAdjusting the maximum value in the coefficient range for the weight; p is 2B-1-1, being the maximum value in the weight range; sth=0.8;WmaxAnd adjusting the coefficients for the weights of the target operator layer.
Step S202, determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.
In the embodiment of the application, the weight scale factor is determined by the following method:
in the formula (4), Scale is a weight Scale factor; p is the maximum value in the weight range; alpha is coefficient, and the value is 0.95; wmaxAnd adjusting the coefficients for the weights of the target operator layer.
Step S203, according to the scale factor and the weight of the target operator layer, carrying out weight optimization processing on the target operator layer.
According to the method provided by the embodiment of the application, the scale factor is multiplied by the weight of the target operator layer to obtain the optimized weight of the target operator layer.
The method provided by the embodiment of the application further comprises the step of adding a preset normalization layer after the target algorithm layer is subjected to the weight optimization.
It should be noted that the preset normalization layer provided in the embodiment of the present application satisfies the following relation:
in formula (5), BNoutThe output of a preset normalization layer; x is the input of a preset normalization layer; μ is the mean value, μ ═ 0; σ is the standard deviation, σ21 is ═ 1; e is a minimum amount for preventing introduction of zero removal; gamma is the scale of a preset normalization layer,β is an offset parameter, β ═ 0.
And S106, carrying out quantization processing on the intermediate model after weight optimization to obtain an optimized neural network model.
In the embodiment of the application, the following method is adopted for quantization processing:
in the formula (6), the first and second groups,the quantized data;the data before quantization; the minimum value in the range of n weights; p is the maximum value in the weight range;int denotes rounding, clip denotes truncation, max denotes a larger value, and abs denotes an absolute value.
After the model is optimized, if B bit widths are adopted for quantization at present, a small amount of pictures are used for testing the neural network model, and the neural network model is compared with the neural network model which is not quantized at all, and if the error between the B bit widths is larger, the B bit widths are not enough to meet the precision requirement. At this point, the size of B may be increased and the optimization may be re-performed. It should be noted that, compared with the method that performs quantization directly according to the bit width of B bits, the method provided in the embodiment of the present application obviously improves accuracy.
According to the method provided by the embodiment of the application, fusion processing is firstly carried out on an operator layer, the calculation dimensionality of the neural network model is reduced, and then weight optimization processing is carried out on the neural network model according to the weight range which can be contained by the target hardware accelerator, so that the target hardware accelerator is adaptive to the neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application. Fig. 3 is a schematic structural diagram illustrating an optimization apparatus of a neural network model provided by an embodiment of the present application. As shown in fig. 3, the apparatus has an optimization function for implementing the neural network model, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The apparatus may include an acquisition module 301, a translation module 302, and a processing module 303.
An obtaining module 301, configured to obtain a structure of the neural network model to be optimized, parameters of the neural network model to be optimized, and weight parameters corresponding to each operator layer of the neural network model to be optimized. The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.
A converting module 302, configured to convert the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized. The format corresponding to the initial model is the format of the open neural network exchange model.
And the processing module 303 is configured to perform operator layer fusion processing on the initial model to obtain an intermediate model. The initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through operator layer fusion processing. And determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient. The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range. And carrying out weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters. And carrying out quantitative processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
Optionally, the processing module 303 is specifically configured to:
and according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer. The target algorithm layer is any algorithm layer with the weight within the weight adjusting coefficient range in the intermediate model.
Optionally, the processing module 303 is specifically configured to:
and determining a weight adjustment coefficient of the target operator layer according to the maximum weight value of the target operator layer and the minimum weight value of the target operator layer.
And determining a weight scale factor according to the weight adjustment coefficient of the target algorithm layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.
And performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
Optionally, the apparatus further includes an adding module, configured to:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
Optionally, the processing module 303 is specifically configured to:
and fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model. The convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and one normalization layer is connected behind each convolution layer.
Optionally, the processing module 303 is specifically configured to:
and if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, fusing the plurality of branch convolution layers into one convolution layer to obtain an intermediate model. The plurality of branch convolution layers are convolution layers, and all of the branch convolution layers are structurally juxtaposed to each other.
According to the method, the operator layer is fused, the calculation dimensionality of the neural network model is reduced, and then the weight optimization processing is carried out on the neural network model according to the weight range which can be contained by the target hardware accelerator, so that the target hardware accelerator is adaptive to the neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
Claims (10)
1. A method of optimizing a neural network model, the method comprising:
acquiring the structure of a neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
performing algorithm layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing;
determining a weight adjustment coefficient range according to a weight range and a preset coefficient of a neural network model which can be accommodated by a target hardware accelerator; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range;
performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameter;
and carrying out quantization processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
2. The method according to claim 1, wherein performing a weight optimization process on the intermediate model according to the weight adjustment coefficient range and the weight parameter comprises:
according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.
3. The method of claim 2, wherein performing weight optimization on the target algorithm layer according to the weight adjustment coefficient range and the weight parameter of the target algorithm layer comprises:
determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
4. The method of claim 2, wherein the weight optimization process is performed on the target algorithm layer, and thereafter further comprising:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
5. The method of claim 1, wherein performing an algorithm layer fusion process on the initial model to obtain an intermediate model comprises:
fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model; the convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and a normalization layer is connected behind each convolution layer.
6. The method of claim 1, wherein performing an algorithm layer fusion process on the initial model to obtain an intermediate model comprises:
if the initial model comprises a plurality of branch convolutional layers with the same convolutional kernel, fusing the plurality of branch convolutional layers into one convolutional layer to obtain the intermediate model; the plurality of branch convolution layers are convolution layers, and all the branch convolution layers are mutually juxtaposed in structure.
7. An apparatus for optimizing a neural network model, the apparatus comprising:
the acquisition module is used for acquiring the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
the conversion module is used for converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
the processing module is used for carrying out computation layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range; according to the weight adjusting coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model; and carrying out quantitative processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
8. The apparatus of claim 7, wherein the processing module is specifically configured to:
according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.
9. The apparatus of claim 7, wherein the processing module is specifically configured to:
determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
10. The apparatus of claim 7, further comprising an adding module to:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110382904.5A CN113128670B (en) | 2021-04-09 | 2021-04-09 | Neural network model optimization method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110382904.5A CN113128670B (en) | 2021-04-09 | 2021-04-09 | Neural network model optimization method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113128670A true CN113128670A (en) | 2021-07-16 |
CN113128670B CN113128670B (en) | 2024-03-19 |
Family
ID=76775672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110382904.5A Active CN113128670B (en) | 2021-04-09 | 2021-04-09 | Neural network model optimization method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113128670B (en) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103426027A (en) * | 2013-07-24 | 2013-12-04 | 浙江大学 | Intelligent normal pool level optimal selection method based on genetic neural network models |
US20180075339A1 (en) * | 2016-09-09 | 2018-03-15 | SK Hynix Inc. | Neural network hardware accelerator architectures and operating method thereof |
CN110378470A (en) * | 2019-07-19 | 2019-10-25 | Oppo广东移动通信有限公司 | Optimization method, device and the computer storage medium of neural network model |
US20190340492A1 (en) * | 2018-05-04 | 2019-11-07 | Microsoft Technology Licensing, Llc | Design flow for quantized neural networks |
US20200057921A1 (en) * | 2017-11-09 | 2020-02-20 | Boe Technology Group Co., Ltd. | Image classification and conversion method and device, image processor and training method therefor, and medium |
CN110826692A (en) * | 2019-10-24 | 2020-02-21 | 腾讯科技(深圳)有限公司 | Automatic model compression method, device, equipment and storage medium |
CN111310684A (en) * | 2020-02-24 | 2020-06-19 | 东声(苏州)智能科技有限公司 | Model training method and device, electronic equipment and storage medium |
CN111602145A (en) * | 2018-10-30 | 2020-08-28 | 深圳鲲云信息科技有限公司 | Optimization method of convolutional neural network and related product |
US20200342572A1 (en) * | 2018-04-02 | 2020-10-29 | Tencent Technology (Shenzhen) Company Limited | Image related processing method and apparatus, device and storage medium |
CN112200297A (en) * | 2020-09-04 | 2021-01-08 | 厦门星宸科技有限公司 | Neural network optimization method, device and processor |
CN112257840A (en) * | 2019-07-22 | 2021-01-22 | 华为技术有限公司 | Neural network processing method and related equipment |
CN112465108A (en) * | 2020-11-11 | 2021-03-09 | 上海交通大学 | Neural network compiling method for storage and calculation integrated platform |
CN112541159A (en) * | 2020-09-30 | 2021-03-23 | 华为技术有限公司 | Model training method and related equipment |
-
2021
- 2021-04-09 CN CN202110382904.5A patent/CN113128670B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103426027A (en) * | 2013-07-24 | 2013-12-04 | 浙江大学 | Intelligent normal pool level optimal selection method based on genetic neural network models |
US20180075339A1 (en) * | 2016-09-09 | 2018-03-15 | SK Hynix Inc. | Neural network hardware accelerator architectures and operating method thereof |
US20200057921A1 (en) * | 2017-11-09 | 2020-02-20 | Boe Technology Group Co., Ltd. | Image classification and conversion method and device, image processor and training method therefor, and medium |
US20200342572A1 (en) * | 2018-04-02 | 2020-10-29 | Tencent Technology (Shenzhen) Company Limited | Image related processing method and apparatus, device and storage medium |
US20190340492A1 (en) * | 2018-05-04 | 2019-11-07 | Microsoft Technology Licensing, Llc | Design flow for quantized neural networks |
CN111602145A (en) * | 2018-10-30 | 2020-08-28 | 深圳鲲云信息科技有限公司 | Optimization method of convolutional neural network and related product |
CN110378470A (en) * | 2019-07-19 | 2019-10-25 | Oppo广东移动通信有限公司 | Optimization method, device and the computer storage medium of neural network model |
CN112257840A (en) * | 2019-07-22 | 2021-01-22 | 华为技术有限公司 | Neural network processing method and related equipment |
CN110826692A (en) * | 2019-10-24 | 2020-02-21 | 腾讯科技(深圳)有限公司 | Automatic model compression method, device, equipment and storage medium |
CN111310684A (en) * | 2020-02-24 | 2020-06-19 | 东声(苏州)智能科技有限公司 | Model training method and device, electronic equipment and storage medium |
CN112200297A (en) * | 2020-09-04 | 2021-01-08 | 厦门星宸科技有限公司 | Neural network optimization method, device and processor |
CN112541159A (en) * | 2020-09-30 | 2021-03-23 | 华为技术有限公司 | Model training method and related equipment |
CN112465108A (en) * | 2020-11-11 | 2021-03-09 | 上海交通大学 | Neural network compiling method for storage and calculation integrated platform |
Non-Patent Citations (5)
Title |
---|
IREM BOYBAT ET.AL: "Improved Deep Neural Network Hardware-Accelerators Based on Non-Volatile-Memory: The Local Gains Technique", 2017 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC) * |
KAI CHEN ET.AL: "A DNN Optimization Framework with Unlabeled Data for Efficient and Accurate Reconfigurable Hardware Inference", 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS) * |
邢景: "基于ARM NEON目标检测网络算法的加速技术研究", 中国优秀硕士学位论文全文数据库 * |
陈凯: "面向硬件实现的深度神经网络模型优化与加速方法研究", 中国优秀硕士学位论文全文数据库 * |
黎明, 严超华, 刘高航: "遗传算法优化前向神经网络结构和权重矢量", 中国图象图形学报, no. 06 * |
Also Published As
Publication number | Publication date |
---|---|
CN113128670B (en) | 2024-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115456160A (en) | Data processing method and data processing equipment | |
CN113449859A (en) | Data processing method and device | |
CN113095129A (en) | Attitude estimation model training method, attitude estimation device and electronic equipment | |
US20220044109A1 (en) | Quantization-aware training of quantized neural networks | |
US20240135174A1 (en) | Data processing method, and neural network model training method and apparatus | |
CN115238893B (en) | Neural network model quantification method and device for natural language processing | |
CN114490065A (en) | Load prediction method, device and equipment | |
CN115423040A (en) | User portrait identification method and AI system of interactive marketing platform | |
CN115238909A (en) | Data value evaluation method based on federal learning and related equipment thereof | |
CN114861758A (en) | Multi-modal data processing method and device, electronic equipment and readable storage medium | |
US11675951B2 (en) | Methods and systems for congestion prediction in logic synthesis using graph neural networks | |
CN112836804B (en) | Image processing method, device, electronic equipment and storage medium | |
CN116468967B (en) | Sample image screening method and device, electronic equipment and storage medium | |
CN113128670A (en) | Neural network model optimization method and device | |
US20230058500A1 (en) | Method and machine learning system to perform quantization of neural network | |
CN113642654B (en) | Image feature fusion method and device, electronic equipment and storage medium | |
CN109766527A (en) | A kind of calculation method and relevant device of text similarity | |
CN112948115B (en) | Cloud workflow scheduler pressure prediction method based on extreme learning machine | |
CN111310794B (en) | Target object classification method and device and electronic equipment | |
CN114239799A (en) | Efficient target detection method, device, medium and system | |
EP4196919A1 (en) | Method and system for quantizing a neural network | |
CN113052309A (en) | Method, computer system and storage medium for compressing neural network model | |
CN117573123B (en) | Page generation method and device applied to webpage application and electronic equipment | |
Hoefler | Benchmarking data science: 12 ways to lie with statistics and performance on parallel computers | |
US20230153066A1 (en) | Method and apparatus for measuring weight of discrete entity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |