CN113128670A

CN113128670A - Neural network model optimization method and device

Info

Publication number: CN113128670A
Application number: CN202110382904.5A
Authority: CN
Inventors: 杜源; 陈凯; 杜力
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-07-16
Anticipated expiration: 2041-04-09
Also published as: CN113128670B

Abstract

The method provided by the application obtains the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; performing algorithm layer fusion processing on the initial model to obtain an intermediate model; determining a weight adjustment coefficient range according to a weight range and a preset coefficient of a neural network model which can be accommodated by a target hardware accelerator; performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters; and carrying out quantization processing on the intermediate model after the weight optimization to obtain an optimized neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited.

Description

Neural network model optimization method and device

Technical Field

The application relates to the technical field of internet, in particular to a method and a device for optimizing a neural network model.

Background

The method for identifying the target object based on the neural network model has extremely high accuracy, so that the method is widely applied to various fields, such as automatic driving, intelligent security and protection, intelligent robots and the like. Accordingly, neural network models are also widely deployed in hardware accelerators compatible with various fields. The hardware accelerator is used for matching with a central processing unit where the hardware accelerator is located to process special work at a high speed. Wherein a hardware accelerator deploying the neural network model is used for exclusively running the neural network model. With the continuous widening of the application field of the neural network model, more and more research and development efforts are put into the neural network model. In order to reduce the research and development cost of the neural network model, a research and development person completes the work of building and training the neural network model on a general CPU, a GPU and other processors, and then the neural network model is adapted to a hardware accelerator in a specific application field.

The neural network model can realize accurate identification of the target object and is based on the calculation complexity of the model height. This requires a hardware accelerator for deploying neural network models that has significant memory and can carry complex computational complexity. However, in practical applications, hardware for deploying the neural network model does not have a very large memory space, which results in that the neural network model cannot be applied to a hardware accelerator with a small memory.

Disclosure of Invention

The application provides an optimization method and device of a neural network model, which can be used for solving the technical problem that the neural network model in the prior art cannot be adapted in hardware with smaller memory.

In a first aspect, the present application provides a method for optimizing a neural network model, the method including:

acquiring the structure of a neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;

converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;

performing algorithm layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing;

determining a weight adjustment coefficient range according to a weight range and a preset coefficient of a neural network model which can be accommodated by a target hardware accelerator; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range;

performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameter;

and carrying out quantization processing on the intermediate model after the weight optimization to obtain an optimized neural network model.

With reference to the first aspect, in an implementation manner of the first aspect, performing weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter includes:

according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.

With reference to the first aspect, in an implementation manner of the first aspect, performing weight optimization processing on the target operator layer according to the weight adjustment coefficient range and the weight parameter of the target operator layer includes:

determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;

determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;

and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.

With reference to the first aspect, in an implementation manner of the first aspect, the performing weight optimization processing on the target operator layer further includes:

and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.

With reference to the first aspect, in an implementation manner of the first aspect, performing operator layer fusion processing on the initial model to obtain an intermediate model includes:

fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model; the convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and a normalization layer is connected behind each convolution layer.

if the initial model comprises a plurality of branch convolutional layers with the same convolutional kernel, fusing the plurality of branch convolutional layers into one convolutional layer to obtain the intermediate model; the plurality of branch convolution layers are convolution layers, and all the branch convolution layers are mutually juxtaposed in structure.

In a second aspect, the present application provides an apparatus for optimizing a neural network model, the apparatus comprising:

the acquisition module is used for acquiring the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;

the conversion module is used for converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;

the processing module is used for carrying out computation layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range; according to the weight adjusting coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model; and carrying out quantitative processing on the intermediate model after the weight optimization to obtain an optimized neural network model.

With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:

With reference to the second aspect, in an implementation manner of the second aspect, the apparatus further includes an adding module configured to:

According to the method, the operator layer is fused, the calculation dimensionality of the neural network model is reduced, and then the weight optimization processing is carried out on the neural network model according to the weight range which can be contained by the target hardware accelerator, so that the target hardware accelerator is adaptive to the neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.

Drawings

Fig. 1 is a schematic flowchart of an optimization method of a neural network model according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an optimization apparatus of a neural network model according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of an optimization method of a neural network model according to an embodiment of the present disclosure. The method provided by the implementation of the application comprises the following steps:

step S101, obtaining the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized.

The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.

It should be noted that the neural network model to be optimized in the embodiment of the present application may be a classical model carried by an open-source framework, or a neural network model built by a user and trained. The neural network model comprises a plurality of operator layers, and each operator layer is a convolution layer or a normalization layer correspondingly. The convolutional layer provided in the embodiment of the present application may be one or more of main convolutional layers such as a normal convolutional layer, a deep convolutional layer, a void convolutional layer, and a packet convolutional layer. The embodiment of the present application does not particularly limit the size, step size, filling, number of channels, and the like of the convolutional layer.

In one implementation, the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized, and the weight parameters corresponding to each operator layer of the neural network model to be optimized can be obtained by traversing the whole neural network model to be optimized.

And S102, converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized.

The format corresponding to the initial model is an Open Neural Network Exchange (ONNX) format.

And step S103, performing algorithm layer fusion processing on the initial model to obtain an intermediate model.

The initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through operator layer fusion processing.

It should be noted that step S102 and step S103 are performed synchronously, that is, the method provided in the embodiment of the present application converts the initial model into the format of the open neural network exchange model while performing operator-layer fusion processing on the initial model.

According to different structures of the neural network model to be optimized, the implementation of the application provides a plurality of algorithm layer fusion processing methods. One such method includes fusing a normalization layer, which is connected to a convolutional layer, into the convolutional layer to obtain an intermediate model. The convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and one normalization layer is connected behind each convolution layer. There is no normalization layer after the new convolutional layer after fusion.

In another algorithm layer fusion processing method, if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, the plurality of branch convolution layers are fused into one convolution layer to obtain an intermediate model. The plurality of branch convolution layers are convolution layers, and all of the branch convolution layers are structurally juxtaposed to each other. Specifically, if the initial model has multiple branches from the input layer to the output layer, and the multiple branches have the same branching point and merging point, all branch convolution layers are considered to be structurally juxtaposed to each other. Further, it is determined whether the convolution kernels of all the branch convolution layers are the same, and if the convolution kernels of all the branch convolution layers are the same, for example, 3 × 3 or 1 × 1, the branch convolution layers are merged into one convolution layer.

The operator layer fusion processing method provided by the embodiment of the application further comprises the step of fusing a plurality of normalization layers into one normalization layer if the neural network model to be optimized has the plurality of continuous normalization layers.

According to the operator layer fusion processing method, the neural network model to be optimized is simplified in structure, and meanwhile, the accuracy of the neural network model to be optimized is maintained.

After step S103 is executed, the method provided in the embodiment of the present application further needs to perform equivalent transformation on the operator layer type in the initial model according to the operator layer type supported by the target hardware accelerator. For example, the hardware accelerator only supports 3 × 3 convolutional layers, and if the initial model includes both 3 × 3 convolutional layers and other types of convolutional layers, equivalent processing needs to be performed on the other types of convolutional layers in the initial model. For another example, if the hardware accelerator does not support hole convolution, zero is filled in the corresponding empty hole in the initial model, so that the hole convolution is equivalent to a normal convolution layer.

According to the method provided by the embodiment of the application, in order to facilitate the subsequent steps to directly obtain the information of the intermediate model, in the process of performing operator layer fusion processing and equivalent processing on the initial model, the information related to the operator layer, such as the input tensor information, the output tensor information, the parameter information and the like of each operator layer is added layer by layer.

And step S104, determining a weight adjusting coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient.

The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range.

In the method provided by the embodiment of the present application, the bit width of B bits is represented for the weight supported by the hardware accelerator, and the weight range of the neural network model that can be accommodated by the target hardware accelerator is determined by the following method:

A＝[-2^B-1,+(2^B-1-1)]formula (1)

In formula (1), A is the weight range，p＝2^B-1-1, being the maximum value in the weight range, n ═ 2^B-1And is the smallest value in the weight range.

The weight adjustment coefficient range is determined by the following method:

D＝[n,n+(p-n)*S_th]formula (2)

In the formula (2), D is the weight adjustment coefficient range; n is-2^B-1The smallest value in the weight range; p is 2^B-1-1, being the maximum value in the weight range; s_th＝0.8。

And step S105, performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters.

In one implementation, according to the connection order of the computation layers, the weight optimization processing is performed on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer.

The target operator layer is any operator layer with the weight within the weight adjusting coefficient range in the intermediate model.

It should be noted that the method provided in the embodiment of the present application does not need to perform weight optimization on all the operator layers in the intermediate model, and only needs to perform weight optimization on a target operator layer whose weight is within the weight adjustment coefficient range. Therefore, according to the method provided by the embodiment of the application, whether the weight corresponding to the current operator layer is within the weight adjustment coefficient range is sequentially judged from the input layer to the output layer according to the connection sequence of the operator layers, and if the weight corresponding to the current operator layer is within the weight adjustment coefficient range, the current operator layer is taken as the target operator layer.

Fig. 2 is a schematic flow chart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application. The weight optimization processing method provided by the embodiment of the application comprises the following steps:

step S201 determines a weight adjustment coefficient of the target operator layer according to the maximum weight value of the target operator layer and the minimum weight value of the target operator layer.

In step S101, the weight parameter corresponding to each operator layer has been obtained, and the weight parameter corresponding to the operator layer is not changed through steps S101 to S104. And comparing the maximum weight value of each operator layer with the minimum weight value of each operator layer, wherein the absolute value between the maximum weight value and the minimum weight value is larger, and taking the larger absolute value as a weight adjustment coefficient of the target operator layer. And if the weight adjusting coefficient range corresponding to the operator layer is within the adjusting range, determining the operator layer as a target operator layer. Correspondingly, the maximum weight value of the operator layer is the maximum weight value of the target operator layer, and the minimum weight value of the operator layer is the minimum weight value of the target operator layer. The target operator layer satisfies the following conditions:

n≤W_max≤n+(p-n)*S_thformula (3)

In the formula (3), n is the minimum value in the weight adjusting coefficient range; n + (p-n) S_thAdjusting the maximum value in the coefficient range for the weight; p is 2^B-1-1, being the maximum value in the weight range; s_th＝0.8；W_maxAnd adjusting the coefficients for the weights of the target operator layer.

Step S202, determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.

In the embodiment of the application, the weight scale factor is determined by the following method:

in the formula (4), Scale is a weight Scale factor; p is the maximum value in the weight range; alpha is coefficient, and the value is 0.95; w_maxAnd adjusting the coefficients for the weights of the target operator layer.

Step S203, according to the scale factor and the weight of the target operator layer, carrying out weight optimization processing on the target operator layer.

According to the method provided by the embodiment of the application, the scale factor is multiplied by the weight of the target operator layer to obtain the optimized weight of the target operator layer.

The method provided by the embodiment of the application further comprises the step of adding a preset normalization layer after the target algorithm layer is subjected to the weight optimization.

It should be noted that the preset normalization layer provided in the embodiment of the present application satisfies the following relation:

in formula (5), BN_outThe output of a preset normalization layer; x is the input of a preset normalization layer; μ is the mean value, μ ═ 0; σ is the standard deviation, σ²1 is ═ 1; e is a minimum amount for preventing introduction of zero removal; gamma is the scale of a preset normalization layer,

β is an offset parameter, β ═ 0.

And S106, carrying out quantization processing on the intermediate model after weight optimization to obtain an optimized neural network model.

In the embodiment of the application, the following method is adopted for quantization processing:

in the formula (6), the first and second groups,

the quantized data;

the data before quantization; the minimum value in the range of n weights; p is the maximum value in the weight range;

int denotes rounding, clip denotes truncation, max denotes a larger value, and abs denotes an absolute value.

After the model is optimized, if B bit widths are adopted for quantization at present, a small amount of pictures are used for testing the neural network model, and the neural network model is compared with the neural network model which is not quantized at all, and if the error between the B bit widths is larger, the B bit widths are not enough to meet the precision requirement. At this point, the size of B may be increased and the optimization may be re-performed. It should be noted that, compared with the method that performs quantization directly according to the bit width of B bits, the method provided in the embodiment of the present application obviously improves accuracy.

According to the method provided by the embodiment of the application, fusion processing is firstly carried out on an operator layer, the calculation dimensionality of the neural network model is reduced, and then weight optimization processing is carried out on the neural network model according to the weight range which can be contained by the target hardware accelerator, so that the target hardware accelerator is adaptive to the neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application. Fig. 3 is a schematic structural diagram illustrating an optimization apparatus of a neural network model provided by an embodiment of the present application. As shown in fig. 3, the apparatus has an optimization function for implementing the neural network model, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The apparatus may include an acquisition module 301, a translation module 302, and a processing module 303.

An obtaining module 301, configured to obtain a structure of the neural network model to be optimized, parameters of the neural network model to be optimized, and weight parameters corresponding to each operator layer of the neural network model to be optimized. The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.

A converting module 302, configured to convert the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized. The format corresponding to the initial model is the format of the open neural network exchange model.

And the processing module 303 is configured to perform operator layer fusion processing on the initial model to obtain an intermediate model. The initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through operator layer fusion processing. And determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient. The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range. And carrying out weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters. And carrying out quantitative processing on the intermediate model after the weight optimization to obtain an optimized neural network model.

Optionally, the processing module 303 is specifically configured to:

and according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer. The target algorithm layer is any algorithm layer with the weight within the weight adjusting coefficient range in the intermediate model.

Optionally, the processing module 303 is specifically configured to:

and determining a weight adjustment coefficient of the target operator layer according to the maximum weight value of the target operator layer and the minimum weight value of the target operator layer.

And determining a weight scale factor according to the weight adjustment coefficient of the target algorithm layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.

Optionally, the apparatus further includes an adding module, configured to:

Optionally, the processing module 303 is specifically configured to:

and fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model. The convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and one normalization layer is connected behind each convolution layer.

Optionally, the processing module 303 is specifically configured to:

and if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, fusing the plurality of branch convolution layers into one convolution layer to obtain an intermediate model. The plurality of branch convolution layers are convolution layers, and all of the branch convolution layers are structurally juxtaposed to each other.

The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of optimizing a neural network model, the method comprising:

2. The method according to claim 1, wherein performing a weight optimization process on the intermediate model according to the weight adjustment coefficient range and the weight parameter comprises:

3. The method of claim 2, wherein performing weight optimization on the target algorithm layer according to the weight adjustment coefficient range and the weight parameter of the target algorithm layer comprises:

4. The method of claim 2, wherein the weight optimization process is performed on the target algorithm layer, and thereafter further comprising:

5. The method of claim 1, wherein performing an algorithm layer fusion process on the initial model to obtain an intermediate model comprises:

6. The method of claim 1, wherein performing an algorithm layer fusion process on the initial model to obtain an intermediate model comprises:

7. An apparatus for optimizing a neural network model, the apparatus comprising:

8. The apparatus of claim 7, wherein the processing module is specifically configured to:

9. The apparatus of claim 7, wherein the processing module is specifically configured to:

10. The apparatus of claim 7, further comprising an adding module to: