CN113128670A - Neural network model optimization method and device - Google Patents

Neural network model optimization method and device Download PDF

Info

Publication number
CN113128670A
CN113128670A CN202110382904.5A CN202110382904A CN113128670A CN 113128670 A CN113128670 A CN 113128670A CN 202110382904 A CN202110382904 A CN 202110382904A CN 113128670 A CN113128670 A CN 113128670A
Authority
CN
China
Prior art keywords
weight
layer
neural network
network model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110382904.5A
Other languages
Chinese (zh)
Other versions
CN113128670B (en
Inventor
杜源
陈凯
杜力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110382904.5A priority Critical patent/CN113128670B/en
Publication of CN113128670A publication Critical patent/CN113128670A/en
Application granted granted Critical
Publication of CN113128670B publication Critical patent/CN113128670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Feedback Control In General (AREA)

Abstract

The method provided by the application obtains the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; performing algorithm layer fusion processing on the initial model to obtain an intermediate model; determining a weight adjustment coefficient range according to a weight range and a preset coefficient of a neural network model which can be accommodated by a target hardware accelerator; performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters; and carrying out quantization processing on the intermediate model after the weight optimization to obtain an optimized neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited.

Description

Neural network model optimization method and device
Technical Field
The application relates to the technical field of internet, in particular to a method and a device for optimizing a neural network model.
Background
The method for identifying the target object based on the neural network model has extremely high accuracy, so that the method is widely applied to various fields, such as automatic driving, intelligent security and protection, intelligent robots and the like. Accordingly, neural network models are also widely deployed in hardware accelerators compatible with various fields. The hardware accelerator is used for matching with a central processing unit where the hardware accelerator is located to process special work at a high speed. Wherein a hardware accelerator deploying the neural network model is used for exclusively running the neural network model. With the continuous widening of the application field of the neural network model, more and more research and development efforts are put into the neural network model. In order to reduce the research and development cost of the neural network model, a research and development person completes the work of building and training the neural network model on a general CPU, a GPU and other processors, and then the neural network model is adapted to a hardware accelerator in a specific application field.
The neural network model can realize accurate identification of the target object and is based on the calculation complexity of the model height. This requires a hardware accelerator for deploying neural network models that has significant memory and can carry complex computational complexity. However, in practical applications, hardware for deploying the neural network model does not have a very large memory space, which results in that the neural network model cannot be applied to a hardware accelerator with a small memory.
Disclosure of Invention
The application provides an optimization method and device of a neural network model, which can be used for solving the technical problem that the neural network model in the prior art cannot be adapted in hardware with smaller memory.
In a first aspect, the present application provides a method for optimizing a neural network model, the method including:
acquiring the structure of a neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
performing algorithm layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing;
determining a weight adjustment coefficient range according to a weight range and a preset coefficient of a neural network model which can be accommodated by a target hardware accelerator; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range;
performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameter;
and carrying out quantization processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
With reference to the first aspect, in an implementation manner of the first aspect, performing weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter includes:
according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.
With reference to the first aspect, in an implementation manner of the first aspect, performing weight optimization processing on the target operator layer according to the weight adjustment coefficient range and the weight parameter of the target operator layer includes:
determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
With reference to the first aspect, in an implementation manner of the first aspect, the performing weight optimization processing on the target operator layer further includes:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
With reference to the first aspect, in an implementation manner of the first aspect, performing operator layer fusion processing on the initial model to obtain an intermediate model includes:
fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model; the convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and a normalization layer is connected behind each convolution layer.
With reference to the first aspect, in an implementation manner of the first aspect, performing operator layer fusion processing on the initial model to obtain an intermediate model includes:
if the initial model comprises a plurality of branch convolutional layers with the same convolutional kernel, fusing the plurality of branch convolutional layers into one convolutional layer to obtain the intermediate model; the plurality of branch convolution layers are convolution layers, and all the branch convolution layers are mutually juxtaposed in structure.
In a second aspect, the present application provides an apparatus for optimizing a neural network model, the apparatus comprising:
the acquisition module is used for acquiring the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
the conversion module is used for converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
the processing module is used for carrying out computation layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range; according to the weight adjusting coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model; and carrying out quantitative processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:
according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.
With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:
determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
With reference to the second aspect, in an implementation manner of the second aspect, the apparatus further includes an adding module configured to:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:
fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model; the convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and a normalization layer is connected behind each convolution layer.
With reference to the second aspect, in an implementable manner of the second aspect, the processing module is specifically configured to:
if the initial model comprises a plurality of branch convolutional layers with the same convolutional kernel, fusing the plurality of branch convolutional layers into one convolutional layer to obtain the intermediate model; the plurality of branch convolution layers are convolution layers, and all the branch convolution layers are mutually juxtaposed in structure.
According to the method, the operator layer is fused, the calculation dimensionality of the neural network model is reduced, and then the weight optimization processing is carried out on the neural network model according to the weight range which can be contained by the target hardware accelerator, so that the target hardware accelerator is adaptive to the neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
Drawings
Fig. 1 is a schematic flowchart of an optimization method of a neural network model according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an optimization apparatus of a neural network model according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of an optimization method of a neural network model according to an embodiment of the present disclosure. The method provided by the implementation of the application comprises the following steps:
step S101, obtaining the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized.
The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.
It should be noted that the neural network model to be optimized in the embodiment of the present application may be a classical model carried by an open-source framework, or a neural network model built by a user and trained. The neural network model comprises a plurality of operator layers, and each operator layer is a convolution layer or a normalization layer correspondingly. The convolutional layer provided in the embodiment of the present application may be one or more of main convolutional layers such as a normal convolutional layer, a deep convolutional layer, a void convolutional layer, and a packet convolutional layer. The embodiment of the present application does not particularly limit the size, step size, filling, number of channels, and the like of the convolutional layer.
In one implementation, the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized, and the weight parameters corresponding to each operator layer of the neural network model to be optimized can be obtained by traversing the whole neural network model to be optimized.
And S102, converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized.
The format corresponding to the initial model is an Open Neural Network Exchange (ONNX) format.
And step S103, performing algorithm layer fusion processing on the initial model to obtain an intermediate model.
The initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through operator layer fusion processing.
It should be noted that step S102 and step S103 are performed synchronously, that is, the method provided in the embodiment of the present application converts the initial model into the format of the open neural network exchange model while performing operator-layer fusion processing on the initial model.
According to different structures of the neural network model to be optimized, the implementation of the application provides a plurality of algorithm layer fusion processing methods. One such method includes fusing a normalization layer, which is connected to a convolutional layer, into the convolutional layer to obtain an intermediate model. The convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and one normalization layer is connected behind each convolution layer. There is no normalization layer after the new convolutional layer after fusion.
In another algorithm layer fusion processing method, if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, the plurality of branch convolution layers are fused into one convolution layer to obtain an intermediate model. The plurality of branch convolution layers are convolution layers, and all of the branch convolution layers are structurally juxtaposed to each other. Specifically, if the initial model has multiple branches from the input layer to the output layer, and the multiple branches have the same branching point and merging point, all branch convolution layers are considered to be structurally juxtaposed to each other. Further, it is determined whether the convolution kernels of all the branch convolution layers are the same, and if the convolution kernels of all the branch convolution layers are the same, for example, 3 × 3 or 1 × 1, the branch convolution layers are merged into one convolution layer.
The operator layer fusion processing method provided by the embodiment of the application further comprises the step of fusing a plurality of normalization layers into one normalization layer if the neural network model to be optimized has the plurality of continuous normalization layers.
According to the operator layer fusion processing method, the neural network model to be optimized is simplified in structure, and meanwhile, the accuracy of the neural network model to be optimized is maintained.
After step S103 is executed, the method provided in the embodiment of the present application further needs to perform equivalent transformation on the operator layer type in the initial model according to the operator layer type supported by the target hardware accelerator. For example, the hardware accelerator only supports 3 × 3 convolutional layers, and if the initial model includes both 3 × 3 convolutional layers and other types of convolutional layers, equivalent processing needs to be performed on the other types of convolutional layers in the initial model. For another example, if the hardware accelerator does not support hole convolution, zero is filled in the corresponding empty hole in the initial model, so that the hole convolution is equivalent to a normal convolution layer.
According to the method provided by the embodiment of the application, in order to facilitate the subsequent steps to directly obtain the information of the intermediate model, in the process of performing operator layer fusion processing and equivalent processing on the initial model, the information related to the operator layer, such as the input tensor information, the output tensor information, the parameter information and the like of each operator layer is added layer by layer.
And step S104, determining a weight adjusting coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient.
The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range.
In the method provided by the embodiment of the present application, the bit width of B bits is represented for the weight supported by the hardware accelerator, and the weight range of the neural network model that can be accommodated by the target hardware accelerator is determined by the following method:
A=[-2B-1,+(2B-1-1)]formula (1)
In formula (1), A is the weight range,p=2B-1-1, being the maximum value in the weight range, n ═ 2B-1And is the smallest value in the weight range.
The weight adjustment coefficient range is determined by the following method:
D=[n,n+(p-n)*Sth]formula (2)
In the formula (2), D is the weight adjustment coefficient range; n is-2B-1The smallest value in the weight range; p is 2B-1-1, being the maximum value in the weight range; sth=0.8。
And step S105, performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters.
In one implementation, according to the connection order of the computation layers, the weight optimization processing is performed on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer.
The target operator layer is any operator layer with the weight within the weight adjusting coefficient range in the intermediate model.
It should be noted that the method provided in the embodiment of the present application does not need to perform weight optimization on all the operator layers in the intermediate model, and only needs to perform weight optimization on a target operator layer whose weight is within the weight adjustment coefficient range. Therefore, according to the method provided by the embodiment of the application, whether the weight corresponding to the current operator layer is within the weight adjustment coefficient range is sequentially judged from the input layer to the output layer according to the connection sequence of the operator layers, and if the weight corresponding to the current operator layer is within the weight adjustment coefficient range, the current operator layer is taken as the target operator layer.
Fig. 2 is a schematic flow chart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application. The weight optimization processing method provided by the embodiment of the application comprises the following steps:
step S201 determines a weight adjustment coefficient of the target operator layer according to the maximum weight value of the target operator layer and the minimum weight value of the target operator layer.
In step S101, the weight parameter corresponding to each operator layer has been obtained, and the weight parameter corresponding to the operator layer is not changed through steps S101 to S104. And comparing the maximum weight value of each operator layer with the minimum weight value of each operator layer, wherein the absolute value between the maximum weight value and the minimum weight value is larger, and taking the larger absolute value as a weight adjustment coefficient of the target operator layer. And if the weight adjusting coefficient range corresponding to the operator layer is within the adjusting range, determining the operator layer as a target operator layer. Correspondingly, the maximum weight value of the operator layer is the maximum weight value of the target operator layer, and the minimum weight value of the operator layer is the minimum weight value of the target operator layer. The target operator layer satisfies the following conditions:
n≤Wmax≤n+(p-n)*Sthformula (3)
In the formula (3), n is the minimum value in the weight adjusting coefficient range; n + (p-n) SthAdjusting the maximum value in the coefficient range for the weight; p is 2B-1-1, being the maximum value in the weight range; sth=0.8;WmaxAnd adjusting the coefficients for the weights of the target operator layer.
Step S202, determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.
In the embodiment of the application, the weight scale factor is determined by the following method:
Figure BDA0003013755890000051
in the formula (4), Scale is a weight Scale factor; p is the maximum value in the weight range; alpha is coefficient, and the value is 0.95; wmaxAnd adjusting the coefficients for the weights of the target operator layer.
Step S203, according to the scale factor and the weight of the target operator layer, carrying out weight optimization processing on the target operator layer.
According to the method provided by the embodiment of the application, the scale factor is multiplied by the weight of the target operator layer to obtain the optimized weight of the target operator layer.
The method provided by the embodiment of the application further comprises the step of adding a preset normalization layer after the target algorithm layer is subjected to the weight optimization.
It should be noted that the preset normalization layer provided in the embodiment of the present application satisfies the following relation:
Figure BDA0003013755890000061
in formula (5), BNoutThe output of a preset normalization layer; x is the input of a preset normalization layer; μ is the mean value, μ ═ 0; σ is the standard deviation, σ21 is ═ 1; e is a minimum amount for preventing introduction of zero removal; gamma is the scale of a preset normalization layer,
Figure BDA0003013755890000062
β is an offset parameter, β ═ 0.
And S106, carrying out quantization processing on the intermediate model after weight optimization to obtain an optimized neural network model.
In the embodiment of the application, the following method is adopted for quantization processing:
Figure BDA0003013755890000063
in the formula (6), the first and second groups,
Figure BDA0003013755890000064
the quantized data;
Figure BDA0003013755890000065
the data before quantization; the minimum value in the range of n weights; p is the maximum value in the weight range;
Figure BDA0003013755890000066
int denotes rounding, clip denotes truncation, max denotes a larger value, and abs denotes an absolute value.
After the model is optimized, if B bit widths are adopted for quantization at present, a small amount of pictures are used for testing the neural network model, and the neural network model is compared with the neural network model which is not quantized at all, and if the error between the B bit widths is larger, the B bit widths are not enough to meet the precision requirement. At this point, the size of B may be increased and the optimization may be re-performed. It should be noted that, compared with the method that performs quantization directly according to the bit width of B bits, the method provided in the embodiment of the present application obviously improves accuracy.
According to the method provided by the embodiment of the application, fusion processing is firstly carried out on an operator layer, the calculation dimensionality of the neural network model is reduced, and then weight optimization processing is carried out on the neural network model according to the weight range which can be contained by the target hardware accelerator, so that the target hardware accelerator is adaptive to the neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application. Fig. 3 is a schematic structural diagram illustrating an optimization apparatus of a neural network model provided by an embodiment of the present application. As shown in fig. 3, the apparatus has an optimization function for implementing the neural network model, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The apparatus may include an acquisition module 301, a translation module 302, and a processing module 303.
An obtaining module 301, configured to obtain a structure of the neural network model to be optimized, parameters of the neural network model to be optimized, and weight parameters corresponding to each operator layer of the neural network model to be optimized. The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.
A converting module 302, configured to convert the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized. The format corresponding to the initial model is the format of the open neural network exchange model.
And the processing module 303 is configured to perform operator layer fusion processing on the initial model to obtain an intermediate model. The initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through operator layer fusion processing. And determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient. The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range. And carrying out weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameters. And carrying out quantitative processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
Optionally, the processing module 303 is specifically configured to:
and according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer. The target algorithm layer is any algorithm layer with the weight within the weight adjusting coefficient range in the intermediate model.
Optionally, the processing module 303 is specifically configured to:
and determining a weight adjustment coefficient of the target operator layer according to the maximum weight value of the target operator layer and the minimum weight value of the target operator layer.
And determining a weight scale factor according to the weight adjustment coefficient of the target algorithm layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.
And performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
Optionally, the apparatus further includes an adding module, configured to:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
Optionally, the processing module 303 is specifically configured to:
and fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model. The convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and one normalization layer is connected behind each convolution layer.
Optionally, the processing module 303 is specifically configured to:
and if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, fusing the plurality of branch convolution layers into one convolution layer to obtain an intermediate model. The plurality of branch convolution layers are convolution layers, and all of the branch convolution layers are structurally juxtaposed to each other.
According to the method, the operator layer is fused, the calculation dimensionality of the neural network model is reduced, and then the weight optimization processing is carried out on the neural network model according to the weight range which can be contained by the target hardware accelerator, so that the target hardware accelerator is adaptive to the neural network model. The method provided by the application solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A method of optimizing a neural network model, the method comprising:
acquiring the structure of a neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
performing algorithm layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing;
determining a weight adjustment coefficient range according to a weight range and a preset coefficient of a neural network model which can be accommodated by a target hardware accelerator; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range;
performing weight optimization processing on the intermediate model according to the weight adjusting coefficient range and the weight parameter;
and carrying out quantization processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
2. The method according to claim 1, wherein performing a weight optimization process on the intermediate model according to the weight adjustment coefficient range and the weight parameter comprises:
according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.
3. The method of claim 2, wherein performing weight optimization on the target algorithm layer according to the weight adjustment coefficient range and the weight parameter of the target algorithm layer comprises:
determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
4. The method of claim 2, wherein the weight optimization process is performed on the target algorithm layer, and thereafter further comprising:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
5. The method of claim 1, wherein performing an algorithm layer fusion process on the initial model to obtain an intermediate model comprises:
fusing the normalization layer connected with the convolution layer into the convolution layer to obtain the intermediate model; the convolution layer is one type of operator layer, the normalization layer is another type of operator layer, and a normalization layer is connected behind each convolution layer.
6. The method of claim 1, wherein performing an algorithm layer fusion process on the initial model to obtain an intermediate model comprises:
if the initial model comprises a plurality of branch convolutional layers with the same convolutional kernel, fusing the plurality of branch convolutional layers into one convolutional layer to obtain the intermediate model; the plurality of branch convolution layers are convolution layers, and all the branch convolution layers are mutually juxtaposed in structure.
7. An apparatus for optimizing a neural network model, the apparatus comprising:
the acquisition module is used for acquiring the structure of the neural network model to be optimized, the parameters of the neural network model to be optimized and the weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
the conversion module is used for converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and the parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
the processing module is used for carrying out computation layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of structurally adjacent operator layers or a plurality of structurally parallel operator layers into one operator layer through the operator layer fusion processing; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to a configuration file of the target hardware accelerator or a preset weight range; according to the weight adjusting coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model; and carrying out quantitative processing on the intermediate model after the weight optimization to obtain an optimized neural network model.
8. The apparatus of claim 7, wherein the processing module is specifically configured to:
according to the connection sequence of the computation layers, carrying out weight optimization processing on the target computation layer according to the weight adjustment coefficient range and the weight parameters of the target computation layer; the target algorithm layer is any algorithm layer with the weight in the weight adjusting coefficient range in the intermediate model.
9. The apparatus of claim 7, wherein the processing module is specifically configured to:
determining a weight adjustment coefficient of the target operator layer according to the weight maximum value of the target operator layer and the weight minimum value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and performing weight optimization processing on the target operator layer according to the scale factor and the weight of the target operator layer.
10. The apparatus of claim 7, further comprising an adding module to:
and adding a preset normalization layer after the target algorithm layer subjected to the weight optimization processing.
CN202110382904.5A 2021-04-09 2021-04-09 Neural network model optimization method and device Active CN113128670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110382904.5A CN113128670B (en) 2021-04-09 2021-04-09 Neural network model optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110382904.5A CN113128670B (en) 2021-04-09 2021-04-09 Neural network model optimization method and device

Publications (2)

Publication Number Publication Date
CN113128670A true CN113128670A (en) 2021-07-16
CN113128670B CN113128670B (en) 2024-03-19

Family

ID=76775672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110382904.5A Active CN113128670B (en) 2021-04-09 2021-04-09 Neural network model optimization method and device

Country Status (1)

Country Link
CN (1) CN113128670B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426027A (en) * 2013-07-24 2013-12-04 浙江大学 Intelligent normal pool level optimal selection method based on genetic neural network models
US20180075339A1 (en) * 2016-09-09 2018-03-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
CN110378470A (en) * 2019-07-19 2019-10-25 Oppo广东移动通信有限公司 Optimization method, device and the computer storage medium of neural network model
US20190340492A1 (en) * 2018-05-04 2019-11-07 Microsoft Technology Licensing, Llc Design flow for quantized neural networks
US20200057921A1 (en) * 2017-11-09 2020-02-20 Boe Technology Group Co., Ltd. Image classification and conversion method and device, image processor and training method therefor, and medium
CN110826692A (en) * 2019-10-24 2020-02-21 腾讯科技(深圳)有限公司 Automatic model compression method, device, equipment and storage medium
CN111310684A (en) * 2020-02-24 2020-06-19 东声(苏州)智能科技有限公司 Model training method and device, electronic equipment and storage medium
CN111602145A (en) * 2018-10-30 2020-08-28 深圳鲲云信息科技有限公司 Optimization method of convolutional neural network and related product
US20200342572A1 (en) * 2018-04-02 2020-10-29 Tencent Technology (Shenzhen) Company Limited Image related processing method and apparatus, device and storage medium
CN112200297A (en) * 2020-09-04 2021-01-08 厦门星宸科技有限公司 Neural network optimization method, device and processor
CN112257840A (en) * 2019-07-22 2021-01-22 华为技术有限公司 Neural network processing method and related equipment
CN112465108A (en) * 2020-11-11 2021-03-09 上海交通大学 Neural network compiling method for storage and calculation integrated platform
CN112541159A (en) * 2020-09-30 2021-03-23 华为技术有限公司 Model training method and related equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426027A (en) * 2013-07-24 2013-12-04 浙江大学 Intelligent normal pool level optimal selection method based on genetic neural network models
US20180075339A1 (en) * 2016-09-09 2018-03-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
US20200057921A1 (en) * 2017-11-09 2020-02-20 Boe Technology Group Co., Ltd. Image classification and conversion method and device, image processor and training method therefor, and medium
US20200342572A1 (en) * 2018-04-02 2020-10-29 Tencent Technology (Shenzhen) Company Limited Image related processing method and apparatus, device and storage medium
US20190340492A1 (en) * 2018-05-04 2019-11-07 Microsoft Technology Licensing, Llc Design flow for quantized neural networks
CN111602145A (en) * 2018-10-30 2020-08-28 深圳鲲云信息科技有限公司 Optimization method of convolutional neural network and related product
CN110378470A (en) * 2019-07-19 2019-10-25 Oppo广东移动通信有限公司 Optimization method, device and the computer storage medium of neural network model
CN112257840A (en) * 2019-07-22 2021-01-22 华为技术有限公司 Neural network processing method and related equipment
CN110826692A (en) * 2019-10-24 2020-02-21 腾讯科技(深圳)有限公司 Automatic model compression method, device, equipment and storage medium
CN111310684A (en) * 2020-02-24 2020-06-19 东声(苏州)智能科技有限公司 Model training method and device, electronic equipment and storage medium
CN112200297A (en) * 2020-09-04 2021-01-08 厦门星宸科技有限公司 Neural network optimization method, device and processor
CN112541159A (en) * 2020-09-30 2021-03-23 华为技术有限公司 Model training method and related equipment
CN112465108A (en) * 2020-11-11 2021-03-09 上海交通大学 Neural network compiling method for storage and calculation integrated platform

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
IREM BOYBAT ET.AL: "Improved Deep Neural Network Hardware-Accelerators Based on Non-Volatile-Memory: The Local Gains Technique", 2017 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC) *
KAI CHEN ET.AL: "A DNN Optimization Framework with Unlabeled Data for Efficient and Accurate Reconfigurable Hardware Inference", 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS) *
邢景: "基于ARM NEON目标检测网络算法的加速技术研究", 中国优秀硕士学位论文全文数据库 *
陈凯: "面向硬件实现的深度神经网络模型优化与加速方法研究", 中国优秀硕士学位论文全文数据库 *
黎明, 严超华, 刘高航: "遗传算法优化前向神经网络结构和权重矢量", 中国图象图形学报, no. 06 *

Also Published As

Publication number Publication date
CN113128670B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN115456160A (en) Data processing method and data processing equipment
CN113449859A (en) Data processing method and device
CN113095129A (en) Attitude estimation model training method, attitude estimation device and electronic equipment
US20220044109A1 (en) Quantization-aware training of quantized neural networks
US20240135174A1 (en) Data processing method, and neural network model training method and apparatus
CN115238893B (en) Neural network model quantification method and device for natural language processing
CN114490065A (en) Load prediction method, device and equipment
CN115423040A (en) User portrait identification method and AI system of interactive marketing platform
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
CN114861758A (en) Multi-modal data processing method and device, electronic equipment and readable storage medium
US11675951B2 (en) Methods and systems for congestion prediction in logic synthesis using graph neural networks
CN112836804B (en) Image processing method, device, electronic equipment and storage medium
CN116468967B (en) Sample image screening method and device, electronic equipment and storage medium
CN113128670A (en) Neural network model optimization method and device
US20230058500A1 (en) Method and machine learning system to perform quantization of neural network
CN113642654B (en) Image feature fusion method and device, electronic equipment and storage medium
CN109766527A (en) A kind of calculation method and relevant device of text similarity
CN112948115B (en) Cloud workflow scheduler pressure prediction method based on extreme learning machine
CN111310794B (en) Target object classification method and device and electronic equipment
CN114239799A (en) Efficient target detection method, device, medium and system
EP4196919A1 (en) Method and system for quantizing a neural network
CN113052309A (en) Method, computer system and storage medium for compressing neural network model
CN117573123B (en) Page generation method and device applied to webpage application and electronic equipment
Hoefler Benchmarking data science: 12 ways to lie with statistics and performance on parallel computers
US20230153066A1 (en) Method and apparatus for measuring weight of discrete entity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant