CN111027685A - Method for depth separable convolution and batch normalization fusion - Google Patents
Method for depth separable convolution and batch normalization fusion Download PDFInfo
- Publication number
- CN111027685A CN111027685A CN201911321112.6A CN201911321112A CN111027685A CN 111027685 A CN111027685 A CN 111027685A CN 201911321112 A CN201911321112 A CN 201911321112A CN 111027685 A CN111027685 A CN 111027685A
- Authority
- CN
- China
- Prior art keywords
- convolution
- batch normalization
- pointwise
- parameters
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010606 normalization Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 17
- 230000004927 fusion Effects 0.000 title claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 230000000694 effects Effects 0.000 claims abstract description 11
- 238000003062 neural network model Methods 0.000 claims abstract description 11
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 4
- 239000010410 layer Substances 0.000 description 24
- 239000002356 single layer Substances 0.000 description 2
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The invention belongs to the technical field of neural network models, and particularly relates to a method for deep separable convolution and batch normalized fusion. Firstly, a group of new parameters are recalculated by a specially designed method from a trained neural network model containing a deep separable convolution and a batch normalization layer, and are used for assigning values to the weight and the offset of the Pointwise convolution and modifying the weight and the offset of the Pointwise convolution; then, the effect of the batch normalization layer in the original network structure is deleted, the calculation of the batch normalization layer is added into Pointwise convolution, and a depth separable convolution layer equivalent to the depth separable convolution and the batch normalization is obtained, so that the effect of convolution fusion batch normalization is realized. The invention can effectively reduce the calculation amount.
Description
Technical Field
The invention belongs to the technical field of neural network models, and particularly relates to a method for deep separable convolution and batch normalized fusion.
Background
Neural network technology, especially lightweight neural networks, has been a hot topic of research and application. The convolution is divided into two steps, the first step is named as Depthwise convolution, the idea of grouping convolution is utilized, so that different convolution layers have no mutual calculation, only the result of single-layer convolution is calculated, and the calculation amount for realizing the convolution is greatly reduced. The second step is called Pointwise convolution, which effectively re-fuses the features learned by the first step of the Depthwise convolution, thereby realizing the defect that the Depthwise features only come from a single layer. So that the two can achieve the effect similar to the convolution of the traditional neural network as a whole. The specific implementation is typically done using a convolution with a convolution kernel of 1x 1.
In the batch normalization layer, the features learned by the middle layer of the neural network can be effectively normalized again, so that the gradient of the neural network can be effectively transmitted among multiple layers, and the training of the deep neural network becomes possible. It has four parameters, two to represent the mean and variance of the input, which are used to re-normalize the features. The other two parameters are parameters for neural network learning and are used for feature reconstruction, so that the features learned by the neural network model are not damaged. Both of them and the deep separable convolution are commonly used in the actual neural network model construction. Therefore, if the two can be fused together in the actual application, the calculation amount can be effectively reduced in the actual application.
Disclosure of Invention
The invention aims to provide a method for fusing depth separable convolution and batch normalization so as to effectively reduce the calculation amount.
The invention provides a method for fusing depth separable convolution and batch normalization, which comprises the steps of training a neural network model containing a depth separable convolution and batch normalization layer, deriving parameters of Pointwise convolution and batch normalization parameters, and recalculating a group of new parameters by a specially designed method for assigning values to the weight and bias of the Pointwise convolution and modifying the weight and bias of the Pointwise convolution; then deleting the effect of batch normalization layer in the original network structure, adding the calculation of the batch normalization layer into Pointwise convolution to obtain a depth separable convolution layer equivalent to the depth separable convolution and the batch normalization, and realizing the effect of convolution fusion batch normalization; the method comprises the following specific steps:
(1) for a trained neural network model containing deep separable convolution and batch normalization layers, which requires no nonlinear activation function between the deep separable convolution and the batch normalization layers, the weight w of Pointwise convolution of the deep separable convolution is firstly derivedpwConvAnd bias term bpwConvAnd parameters gamma, β, mean and var of the batch normalization layer, wherein gamma and β are learning parameters of the batch normalization layer, and mean and var are calculation parameters of the batch normalization layer, which are used for subsequent calculation;
(2) calculating to obtain a new Pointwise convolution parameter according to the following formula:
wherein, epsilon represents a hyper parameter for preventing dividing 0, and epsilon represents convolution calculation;
(3) will be provided withAndweight w replacing the original Pointwise convolutionpwConvAnd bias term bpwConvDeleting the batch normalization layer in the original network to obtain a new neural network structure and corresponding weight; at this point, the depth separable convolution and batch normalization fusion is completed; by ydwConvRepresenting the output of the Depthwise convolution, ybnRepresents the normalized output of the batch, and thus is directly connected to ydwConvAnd ybn:
(4) After the new network structure is obtained, the new network structure can be used to replace the original network structure, thereby achieving the effect of simplifying the calculation amount.
According to the invention, through the design of the method, the batch specification layer can be effectively fused into the deep separable convolution, so that the calculation amount of the neural network model in the inference stage can be reduced.
In the invention, after the model training is finished, all trained model parameters are derived, and the weight w of Pointwise convolution is weightedpwConvAnd bias term bpwConvAnd the parameters gamma, β, mean and var of the batch normalization layer are mathematically derived and calculated so that new parameters can be calculatedAndand uses it to replace the weight w of the original Pointwise convolutionpwConvAnd bias term bpwConv。
In the present invention, the batch normalization layer in the original network structure is deleted, and then the weights and biases of the Pointwise convolution of the depth separable convolution of the original structure are modified by the new weights and biases.
The method can effectively reduce the calculation amount.
Drawings
FIG. 1 is a schematic view of the process of the present invention.
Detailed Description
The invention will be further described with reference to the following schematic drawings.
The starting neural network layer structure is shown in the upper half of fig. 1, which contains a depth separable convolution and batch normalization, which ends up in three parts in the schematic because the depth separable convolution contains two parts Depthwise and Pointwise. The first part is the Depthwise convolution, which is a separate convolution, all of which uses convolution kernels of three different colors convolved with the corresponding convolutional layers to represent the idea of their separate convolutions. By separating the convolutions, an output of the separating convolution is obtained. Which is fed to the Pointwise convolution as input to the Pointwise. For Pointwise convolution, which is a conventional convolution with a convolution kernel of 1x1, the convolution process is represented here using an interleaved 1x1 convolution kernel, and the effect of fusing the outputs of different Depthwise convolutions is achieved by such Pointwise convolution. After the Pointwise convolution is finished, the Pointwise output is further processed using a batch normalization layer (base normalization), which computes a Depthwise convolution with a convolution kernel of 1 × 1. This allows the data to be processed efficiently so that the counter-propagating gradient is better preserved.
It is worth noting that the method of the present invention requires that there cannot be a non-linear activation function between poitwise and batch normalization. In practical designs, the activation function is typically added after the batch normalization layer, which also ensures that the batch normalization layer performs well. After the model training is completed, the parameters of Depthwise convolution, Pointwise convolution and batch normalization are all determined and saved in the model file.
Reading the parameters from the model file, and calculating according to the formulas (1) and (2)Andwherein the over-parameter is selected as 10-20. A neural network model B is then redesigned as shown in the lower half of fig. 1. Its structure is nearly identical to the original model structure a except that the batch normalization after each deep separable convolution is removed from the network structure, while all other network layer structures are preserved.
And for the network layers except for the Pointwise convolution, assigning the weight of the corresponding layer of the originally trained network structure A to the model B. For Pointwise convolution, the calculatedAndand assigning the weights and the bias of the Pointwise convolution so as to finish the assignment of all the parameters of the constructed new network. This results in a completely new network structure model that can be used to replace the original model for inference.
It can be easily found that the originally trained network structure A has more normalized calculation amount than the newly designed simplified model B, and the calculation amount is consistent in other places. In fact, the performance of the newly designed model is almost consistent with that of the original model, so the invention realizes the effect of saving a part of the calculated amount in the original model. And finally, replacing A with the newly designed model B, and performing inference.
Claims (1)
1. The method for consistent depth separable convolution and batch normalization fusion is characterized in that parameters of Pointwise convolution and parameters of batch normalization which are derived from a trained neural network model containing a depth separable convolution and a batch normalization layer are recalculated into a group of new parameters through a specially designed counting method, and the new parameters are used for assigning values to the weights and the offsets of the Pointwise convolution and modifying the weights and the offsets of the Pointwise convolution; then deleting the effect of batch normalization layer in the original network structure, adding the calculation of the batch normalization layer into Pointwise convolution to obtain a depth separable convolution layer equivalent to the depth separable convolution and the batch normalization, and realizing the effect of convolution fusion batch normalization; the method comprises the following specific steps:
(1) for a trained neural network model containing deep separable convolution and batch normalization layers, which requires no nonlinear activation function between the deep separable convolution and the batch normalization layers, the weight w of Pointwise convolution of the deep separable convolution is firstly derivedpwConvAnd bias term bpwConvAnd parameters gamma, β, mean and var of the batch normalization layer, wherein gamma and β are learning parameters of the batch normalization layer, and mean and var are calculation parameters of the batch normalization layer;
(2) the new Pointwise convolution parameter is calculated as follows:
wherein, epsilon represents a hyper parameter for preventing dividing 0, and epsilon represents convolution calculation;
(3) will be provided withAndweight w replacing the original Pointwise convolutionpwConvAnd bias term bpwConvDeleting the batch normalization layer in the original network to obtain a new neural network structure and corresponding weight; at this point, the depth separable convolution and batch normalization fusion is completed; by ydwConvRepresenting the output of the Depthwise convolution, ybnRepresenting batch normalized output, and thus directly connected to ydwConvAnd ybn:
(4) After the new network structure is obtained, the new network structure is used to replace the original network structure, thereby achieving the effect of simplifying the calculation amount.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911321112.6A CN111027685A (en) | 2019-12-20 | 2019-12-20 | Method for depth separable convolution and batch normalization fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911321112.6A CN111027685A (en) | 2019-12-20 | 2019-12-20 | Method for depth separable convolution and batch normalization fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111027685A true CN111027685A (en) | 2020-04-17 |
Family
ID=70211238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911321112.6A Pending CN111027685A (en) | 2019-12-20 | 2019-12-20 | Method for depth separable convolution and batch normalization fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111027685A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113344200A (en) * | 2021-06-17 | 2021-09-03 | 阿波罗智联(北京)科技有限公司 | Method for training separable convolutional network, road side equipment and cloud control platform |
-
2019
- 2019-12-20 CN CN201911321112.6A patent/CN111027685A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113344200A (en) * | 2021-06-17 | 2021-09-03 | 阿波罗智联(北京)科技有限公司 | Method for training separable convolutional network, road side equipment and cloud control platform |
CN113344200B (en) * | 2021-06-17 | 2024-05-28 | 阿波罗智联(北京)科技有限公司 | Method for training separable convolutional network, road side equipment and cloud control platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111784602B (en) | Method for generating countermeasure network for image restoration | |
CN110472667B (en) | Small target classification method based on deconvolution neural network | |
Wright | Nelder, Mead, and the other simplex method | |
CN109584170B (en) | Underwater image restoration method based on convolutional neural network | |
CN109949214A (en) | A kind of image Style Transfer method and system | |
JP2001522495A (en) | How to Aggregate a Dynamically Growing Database Data Set | |
CN103854261B (en) | The bearing calibration of colour cast image | |
CN109146944A (en) | A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth | |
CN110009250A (en) | A kind of twin workshop quality evaluating method of number based on hesitation fuzzy language | |
CN109961434A (en) | Non-reference picture quality appraisement method towards the decaying of level semanteme | |
CN107333176A (en) | The method and system that a kind of distributed video is rendered | |
CN106528070A (en) | Data table generation method and equipment | |
CN109005398A (en) | A kind of stereo image parallax matching process based on convolutional neural networks | |
TW202011278A (en) | Electronic apparatus and compression method for artificial neural network | |
CN110232652A (en) | Image processing engine processing method, image processing method for terminal and terminal | |
CN111027685A (en) | Method for depth separable convolution and batch normalization fusion | |
CN109903100A (en) | A kind of customer churn prediction technique, device and readable storage medium storing program for executing | |
CN107945037A (en) | A kind of social networks based on node structure feature goes de-identification method | |
CN112651533A (en) | Integrated moving average autoregression-back propagation neural network prediction method | |
CN110209732A (en) | Method of data synchronization and device of the relevant database to Hadoop database | |
CN104809258B (en) | The adaptive amending method of dough sheet normal vector in electromagnetic scattering simulation modeling | |
CN110880932A (en) | Logic circuit approximate realization method | |
CN105405060B (en) | Customized product similarity calculation method based on structure editing operation | |
CN111209839A (en) | Face recognition method | |
CN103310077B (en) | Roll the modeling method of consequent sequence stretching adult |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200417 |