CN111027685A - Method for depth separable convolution and batch normalization fusion - Google Patents

Method for depth separable convolution and batch normalization fusion Download PDF

Info

Publication number
CN111027685A
CN111027685A CN201911321112.6A CN201911321112A CN111027685A CN 111027685 A CN111027685 A CN 111027685A CN 201911321112 A CN201911321112 A CN 201911321112A CN 111027685 A CN111027685 A CN 111027685A
Authority
CN
China
Prior art keywords
convolution
batch normalization
pointwise
parameters
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911321112.6A
Other languages
Chinese (zh)
Inventor
范益波
刘超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201911321112.6A priority Critical patent/CN111027685A/en
Publication of CN111027685A publication Critical patent/CN111027685A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of neural network models, and particularly relates to a method for deep separable convolution and batch normalized fusion. Firstly, a group of new parameters are recalculated by a specially designed method from a trained neural network model containing a deep separable convolution and a batch normalization layer, and are used for assigning values to the weight and the offset of the Pointwise convolution and modifying the weight and the offset of the Pointwise convolution; then, the effect of the batch normalization layer in the original network structure is deleted, the calculation of the batch normalization layer is added into Pointwise convolution, and a depth separable convolution layer equivalent to the depth separable convolution and the batch normalization is obtained, so that the effect of convolution fusion batch normalization is realized. The invention can effectively reduce the calculation amount.

Description

Method for depth separable convolution and batch normalization fusion
Technical Field
The invention belongs to the technical field of neural network models, and particularly relates to a method for deep separable convolution and batch normalized fusion.
Background
Neural network technology, especially lightweight neural networks, has been a hot topic of research and application. The convolution is divided into two steps, the first step is named as Depthwise convolution, the idea of grouping convolution is utilized, so that different convolution layers have no mutual calculation, only the result of single-layer convolution is calculated, and the calculation amount for realizing the convolution is greatly reduced. The second step is called Pointwise convolution, which effectively re-fuses the features learned by the first step of the Depthwise convolution, thereby realizing the defect that the Depthwise features only come from a single layer. So that the two can achieve the effect similar to the convolution of the traditional neural network as a whole. The specific implementation is typically done using a convolution with a convolution kernel of 1x 1.
In the batch normalization layer, the features learned by the middle layer of the neural network can be effectively normalized again, so that the gradient of the neural network can be effectively transmitted among multiple layers, and the training of the deep neural network becomes possible. It has four parameters, two to represent the mean and variance of the input, which are used to re-normalize the features. The other two parameters are parameters for neural network learning and are used for feature reconstruction, so that the features learned by the neural network model are not damaged. Both of them and the deep separable convolution are commonly used in the actual neural network model construction. Therefore, if the two can be fused together in the actual application, the calculation amount can be effectively reduced in the actual application.
Disclosure of Invention
The invention aims to provide a method for fusing depth separable convolution and batch normalization so as to effectively reduce the calculation amount.
The invention provides a method for fusing depth separable convolution and batch normalization, which comprises the steps of training a neural network model containing a depth separable convolution and batch normalization layer, deriving parameters of Pointwise convolution and batch normalization parameters, and recalculating a group of new parameters by a specially designed method for assigning values to the weight and bias of the Pointwise convolution and modifying the weight and bias of the Pointwise convolution; then deleting the effect of batch normalization layer in the original network structure, adding the calculation of the batch normalization layer into Pointwise convolution to obtain a depth separable convolution layer equivalent to the depth separable convolution and the batch normalization, and realizing the effect of convolution fusion batch normalization; the method comprises the following specific steps:
(1) for a trained neural network model containing deep separable convolution and batch normalization layers, which requires no nonlinear activation function between the deep separable convolution and the batch normalization layers, the weight w of Pointwise convolution of the deep separable convolution is firstly derivedpwConvAnd bias term bpwConvAnd parameters gamma, β, mean and var of the batch normalization layer, wherein gamma and β are learning parameters of the batch normalization layer, and mean and var are calculation parameters of the batch normalization layer, which are used for subsequent calculation;
(2) calculating to obtain a new Pointwise convolution parameter according to the following formula:
Figure BDA0002327174830000021
Figure BDA0002327174830000022
wherein, epsilon represents a hyper parameter for preventing dividing 0, and epsilon represents convolution calculation;
(3) will be provided with
Figure BDA0002327174830000023
And
Figure BDA0002327174830000024
weight w replacing the original Pointwise convolutionpwConvAnd bias term bpwConvDeleting the batch normalization layer in the original network to obtain a new neural network structure and corresponding weight; at this point, the depth separable convolution and batch normalization fusion is completed; by ydwConvRepresenting the output of the Depthwise convolution, ybnRepresents the normalized output of the batch, and thus is directly connected to ydwConvAnd ybn
Figure BDA0002327174830000025
(4) After the new network structure is obtained, the new network structure can be used to replace the original network structure, thereby achieving the effect of simplifying the calculation amount.
According to the invention, through the design of the method, the batch specification layer can be effectively fused into the deep separable convolution, so that the calculation amount of the neural network model in the inference stage can be reduced.
In the invention, after the model training is finished, all trained model parameters are derived, and the weight w of Pointwise convolution is weightedpwConvAnd bias term bpwConvAnd the parameters gamma, β, mean and var of the batch normalization layer are mathematically derived and calculated so that new parameters can be calculated
Figure BDA0002327174830000026
And
Figure BDA0002327174830000027
and uses it to replace the weight w of the original Pointwise convolutionpwConvAnd bias term bpwConv
In the present invention, the batch normalization layer in the original network structure is deleted, and then the weights and biases of the Pointwise convolution of the depth separable convolution of the original structure are modified by the new weights and biases.
The method can effectively reduce the calculation amount.
Drawings
FIG. 1 is a schematic view of the process of the present invention.
Detailed Description
The invention will be further described with reference to the following schematic drawings.
The starting neural network layer structure is shown in the upper half of fig. 1, which contains a depth separable convolution and batch normalization, which ends up in three parts in the schematic because the depth separable convolution contains two parts Depthwise and Pointwise. The first part is the Depthwise convolution, which is a separate convolution, all of which uses convolution kernels of three different colors convolved with the corresponding convolutional layers to represent the idea of their separate convolutions. By separating the convolutions, an output of the separating convolution is obtained. Which is fed to the Pointwise convolution as input to the Pointwise. For Pointwise convolution, which is a conventional convolution with a convolution kernel of 1x1, the convolution process is represented here using an interleaved 1x1 convolution kernel, and the effect of fusing the outputs of different Depthwise convolutions is achieved by such Pointwise convolution. After the Pointwise convolution is finished, the Pointwise output is further processed using a batch normalization layer (base normalization), which computes a Depthwise convolution with a convolution kernel of 1 × 1. This allows the data to be processed efficiently so that the counter-propagating gradient is better preserved.
It is worth noting that the method of the present invention requires that there cannot be a non-linear activation function between poitwise and batch normalization. In practical designs, the activation function is typically added after the batch normalization layer, which also ensures that the batch normalization layer performs well. After the model training is completed, the parameters of Depthwise convolution, Pointwise convolution and batch normalization are all determined and saved in the model file.
Reading the parameters from the model file, and calculating according to the formulas (1) and (2)
Figure BDA0002327174830000031
And
Figure BDA0002327174830000032
wherein the over-parameter is selected as 10-20. A neural network model B is then redesigned as shown in the lower half of fig. 1. Its structure is nearly identical to the original model structure a except that the batch normalization after each deep separable convolution is removed from the network structure, while all other network layer structures are preserved.
And for the network layers except for the Pointwise convolution, assigning the weight of the corresponding layer of the originally trained network structure A to the model B. For Pointwise convolution, the calculated
Figure BDA0002327174830000033
And
Figure BDA0002327174830000034
and assigning the weights and the bias of the Pointwise convolution so as to finish the assignment of all the parameters of the constructed new network. This results in a completely new network structure model that can be used to replace the original model for inference.
It can be easily found that the originally trained network structure A has more normalized calculation amount than the newly designed simplified model B, and the calculation amount is consistent in other places. In fact, the performance of the newly designed model is almost consistent with that of the original model, so the invention realizes the effect of saving a part of the calculated amount in the original model. And finally, replacing A with the newly designed model B, and performing inference.

Claims (1)

1. The method for consistent depth separable convolution and batch normalization fusion is characterized in that parameters of Pointwise convolution and parameters of batch normalization which are derived from a trained neural network model containing a depth separable convolution and a batch normalization layer are recalculated into a group of new parameters through a specially designed counting method, and the new parameters are used for assigning values to the weights and the offsets of the Pointwise convolution and modifying the weights and the offsets of the Pointwise convolution; then deleting the effect of batch normalization layer in the original network structure, adding the calculation of the batch normalization layer into Pointwise convolution to obtain a depth separable convolution layer equivalent to the depth separable convolution and the batch normalization, and realizing the effect of convolution fusion batch normalization; the method comprises the following specific steps:
(1) for a trained neural network model containing deep separable convolution and batch normalization layers, which requires no nonlinear activation function between the deep separable convolution and the batch normalization layers, the weight w of Pointwise convolution of the deep separable convolution is firstly derivedpwConvAnd bias term bpwConvAnd parameters gamma, β, mean and var of the batch normalization layer, wherein gamma and β are learning parameters of the batch normalization layer, and mean and var are calculation parameters of the batch normalization layer;
(2) the new Pointwise convolution parameter is calculated as follows:
Figure FDA0002327174820000011
Figure FDA0002327174820000012
wherein, epsilon represents a hyper parameter for preventing dividing 0, and epsilon represents convolution calculation;
(3) will be provided with
Figure FDA0002327174820000013
And
Figure FDA0002327174820000014
weight w replacing the original Pointwise convolutionpwConvAnd bias term bpwConvDeleting the batch normalization layer in the original network to obtain a new neural network structure and corresponding weight; at this point, the depth separable convolution and batch normalization fusion is completed; by ydwConvRepresenting the output of the Depthwise convolution, ybnRepresenting batch normalized output, and thus directly connected to ydwConvAnd ybn
Figure FDA0002327174820000015
(4) After the new network structure is obtained, the new network structure is used to replace the original network structure, thereby achieving the effect of simplifying the calculation amount.
CN201911321112.6A 2019-12-20 2019-12-20 Method for depth separable convolution and batch normalization fusion Pending CN111027685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911321112.6A CN111027685A (en) 2019-12-20 2019-12-20 Method for depth separable convolution and batch normalization fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911321112.6A CN111027685A (en) 2019-12-20 2019-12-20 Method for depth separable convolution and batch normalization fusion

Publications (1)

Publication Number Publication Date
CN111027685A true CN111027685A (en) 2020-04-17

Family

ID=70211238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911321112.6A Pending CN111027685A (en) 2019-12-20 2019-12-20 Method for depth separable convolution and batch normalization fusion

Country Status (1)

Country Link
CN (1) CN111027685A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344200A (en) * 2021-06-17 2021-09-03 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344200A (en) * 2021-06-17 2021-09-03 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform
CN113344200B (en) * 2021-06-17 2024-05-28 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform

Similar Documents

Publication Publication Date Title
CN111784602B (en) Method for generating countermeasure network for image restoration
CN110472667B (en) Small target classification method based on deconvolution neural network
Wright Nelder, Mead, and the other simplex method
CN109584170B (en) Underwater image restoration method based on convolutional neural network
CN109949214A (en) A kind of image Style Transfer method and system
JP2001522495A (en) How to Aggregate a Dynamically Growing Database Data Set
CN103854261B (en) The bearing calibration of colour cast image
CN109146944A (en) A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth
CN110009250A (en) A kind of twin workshop quality evaluating method of number based on hesitation fuzzy language
CN109961434A (en) Non-reference picture quality appraisement method towards the decaying of level semanteme
CN107333176A (en) The method and system that a kind of distributed video is rendered
CN106528070A (en) Data table generation method and equipment
CN109005398A (en) A kind of stereo image parallax matching process based on convolutional neural networks
TW202011278A (en) Electronic apparatus and compression method for artificial neural network
CN110232652A (en) Image processing engine processing method, image processing method for terminal and terminal
CN111027685A (en) Method for depth separable convolution and batch normalization fusion
CN109903100A (en) A kind of customer churn prediction technique, device and readable storage medium storing program for executing
CN107945037A (en) A kind of social networks based on node structure feature goes de-identification method
CN112651533A (en) Integrated moving average autoregression-back propagation neural network prediction method
CN110209732A (en) Method of data synchronization and device of the relevant database to Hadoop database
CN104809258B (en) The adaptive amending method of dough sheet normal vector in electromagnetic scattering simulation modeling
CN110880932A (en) Logic circuit approximate realization method
CN105405060B (en) Customized product similarity calculation method based on structure editing operation
CN111209839A (en) Face recognition method
CN103310077B (en) Roll the modeling method of consequent sequence stretching adult

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200417