CN111461144A - Method for accelerating convolutional neural network - Google Patents
Method for accelerating convolutional neural network Download PDFInfo
- Publication number
- CN111461144A CN111461144A CN202010244305.2A CN202010244305A CN111461144A CN 111461144 A CN111461144 A CN 111461144A CN 202010244305 A CN202010244305 A CN 202010244305A CN 111461144 A CN111461144 A CN 111461144A
- Authority
- CN
- China
- Prior art keywords
- convolution
- feature maps
- groups
- group
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 19
- 238000010586 diagram Methods 0.000 claims abstract description 47
- 238000004590 computer program Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 6
- 230000010354 integration Effects 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 101150041570 TOP1 gene Proteins 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a method for accelerating a convolutional neural network, which comprises the following steps: step 1: dividing an input feature map with N channels into G groups of initial feature maps along the channel direction, wherein the G th feature mapiThe set of initial feature maps includes SiThe characteristic diagram is shown in the figure,performing first group convolution on the G groups of initial feature maps to obtain G groups of first feature maps, wherein N, G, SiIs an integer of 1 or more; step 2: the G groups of first feature maps are subdivided into F groups of second feature maps, wherein the F th feature mapjThe second feature map group contains T from different first feature map groupsjThe characteristic diagram is shown in the figure,performing a second group convolution on the F groups of second feature maps to obtain an output feature map with M channels, wherein F, TjAnd M is an integer of 1 or more.
Description
Technical Field
The invention relates to a deep learning technology, in particular to a method for accelerating a convolutional neural network.
Background
With the technical development of deep learning, application techniques based on deep learning are widely applied to various fields in life. Deep learning has also evolved from the earliest cloud computing to today's terminal computing. Because most deep learning applications are large in scale, the requirements on the performance of a machine during training and prediction are high, and the storage resources and the computing power of terminal equipment are very limited, how to accelerate deep learning becomes a new technical hotspot. The convolutional neural network is a widely applied deep learning model, and the acceleration method is a hot topic.
Convolutional Neural Networks (CNN) are similar to the multilayer perceptron of an artificial Neural Network, and extract features by convolution, integrate different features, and finally make predictions. The convolutional neural network mainly comprises a data input layer, a convolutional layer, an excitation layer (activation function), a pooling layer and a full-link layer, wherein the convolutional layer is used for carrying out feature extraction on input data. The convolution operation of the convolution layer occupies a large amount of resources of the convolutional neural network, so how to reduce the amount of convolution operation becomes the key for accelerating the convolutional neural network.
The existing lightweight convolutional neural network adopts deep-separable convolution (Depth-wise Sepa rablecontolution), which decomposes the conventional convolution into two subtasks: (1) depth-wise convolution (depth-wise convolution) for performing the task of image convolution inside each feature layer; and (2) point-wise convolution (point-wise convolution) for realizing information interaction between different feature layers. Compared with the traditional method of direct convolution by a plurality of convolution kernels, the deep separable convolution can greatly reduce model parameters and calculation amount. However, in the deep separable convolution, because the algorithm complexity of the point-by-point convolution is far higher than that of the task deep convolution, a large amount of resources are used for information interaction between different feature layers, so that resource distribution is unbalanced, and the overall operation efficiency of the convolution layer is reduced.
Disclosure of Invention
The invention provides a method for accelerating a convolutional neural network, which comprises the following steps: step 1: dividing an input feature map with N channels into G groups of initial feature maps along the channel direction, wherein the G th feature mapiThe set of initial feature maps includes SiThe characteristic diagram is shown in the figure,performing first group convolution on the G groups of initial feature maps to obtain G groups of first feature maps, wherein N, G, SiIs an integer of 1 or more; step 2: the G groups of first feature maps are subdivided into F groups of second feature maps, wherein the F th feature mapjThe second feature map group contains T from different first feature map groupsjThe characteristic diagram is shown in the figure,performing a second group convolution on the F groups of second feature maps to obtain an output feature map with M channels, wherein F, TjAnd M is an integer of 1 or more.
Optionally, in step 1, the input feature map with N channels is averagely divided into G groups of initial feature maps along a channel direction, where each group of initial feature maps includes S feature maps, and S × G ═ N; and in the step 2, averagely dividing the G groups of first feature maps into F groups of second feature maps, wherein each group of second feature maps contains T feature maps, where F is S and T is G, so that the F isjEach characteristic diagram in the second characteristic diagram group is respectively from different G-th characteristicsiAnd (4) grouping the first characteristic graphs.
Optionally, wherein,and wherein the size of the convolution kernel in the first set of convolutions is K x K and the size of the convolution kernel in the second set of convolutions is 1 x 1.
Optionally, whereinAnd when the number of the S is a non-integer, the value of S is an integer closest to S'.
Optionally, whereinWhen non-integer, S is rounded off as S', andthe integer is G ', and when G ' is S ═ N, G ═ G '.
Optionally, wherein,when non-integer, S is rounded off as S', andand when G 'is greater than N, copying the first G' S-N characteristic graphs in the input characteristic graphs and merging the characteristic graphs with the input characteristic graphs to obtain G 'S layer input characteristic graphs, and grouping according to G' and S.
Optionally, wherein the first group convolution comprises: and performing convolution for S times on each group of initial characteristic graphs respectively.
Optionally, wherein the second group of convolution comprises: when in useWhen the number is an integer, each group of second characteristic graphs is respectively processedAnd (4) performing secondary convolution.
Optionally, wherein the second group of convolution comprises: when in useAnd when the number of the second feature maps is non-integer, taking the integer as W downwards, wherein R is M-W S, performing W +1 times of convolution on the R groups of second feature maps in the S groups of second feature maps, and performing W times of convolution on other groups of second feature maps.
A further aspect of the invention also provides a storage medium in which a computer program is stored which, when being executed by a processor, is operable to carry out any of the methods described above.
Another aspect of the present invention also provides an electronic device comprising a processor and a memory, the memory having stored therein a computer program which, when executed by the processor, is operable to carry out any of the methods described above.
Compared with the prior art, the invention has the advantages that:
according to the convolutional neural network convolution method, the determined grouping mode is adopted for the input characteristic diagram of the convolutional layer, and two groups of convolutions are carried out, so that the calculation amount among different convolution operations is more balanced, the total calculation amount and complexity of the convolutional neural network can be effectively reduced, the efficiency of the convolutional operation is obviously improved, and the network speed is increased; in addition, in some embodiments, the universality of the network acceleration method in the invention is increased by optimizing the grouping mode of the input feature map.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1A shows a schematic diagram of the deep convolution operation in a depth separable convolution. (ii) a
FIG. 1B shows a schematic diagram of a point-by-point convolution operation in a depth separable convolution;
FIG. 2 illustrates a method for accelerating a convolutional neural network, in accordance with one embodiment of the present invention;
FIG. 3A shows a schematic diagram of a conventional convolution operation;
FIG. 3B shows a schematic diagram of a group convolution operation;
FIG. 4A is a diagram illustrating a first group convolution of G groups of input feature maps to obtain G groups of first feature maps according to an embodiment of the present invention;
fig. 4B is a diagram illustrating the first feature map of the G group is subdivided into the second feature maps of the F group and the second group convolution is performed according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The prior art deep separable convolution is performed by decomposing a complete convolution operation into two steps of deep convolution and point-by-point convolution, wherein the deep convolution is filtered by using a single-channel convolution kernel on each channel of the input feature map, i.e. one convolution kernel is responsible for only one channel. FIG. 1A shows a schematic diagram of the deep convolution operation in a depth separable convolution. As shown in fig. 1A, it is assumed that the input feature map has a size W × H, the number of channels (number of layers) is N, the convolution kernel has a size K × K, and the number of convolution kernels is equal to the number of channels of the input feature map. Under the condition that the step length is 1 and the filling is 0, an output characteristic diagram with the size of W × H and the number of channels N can be obtained through deep convolution, and the operation complexity is as follows:
θtask1=θ(N*W*H*K*K) (1)
since the deep convolution only performs independent convolution operation on each channel of the input feature map, and does not effectively utilize feature information of different channels at the same spatial position, it is also necessary to perform linear combination on each channel of the deep convolution output feature map through point-by-point convolution. In point-by-point convolution, each convolution kernel is responsible for deep convolution of each channel of the output feature map. FIG. 1B shows a schematic diagram of a point-by-point convolution operation in a depth separable convolution. As shown in fig. 1B, the feature map output by the deep convolution is used as the input feature map of the point-by-point convolution, the size of the convolution kernel is 1 × N, N is the number of channels of the input feature map (that is, the deep convolution output feature map) of the point-by-point convolution, M convolution kernels are shared, and M is the number of channels of the output feature map after the point-by-point convolution. After point-by-point convolution, performing M-time linear combination on each element in the output feature graph of the deep convolution with the size of W × H and the number of channels N to complete interaction and integration of information between different channels, and finally obtaining the output feature graph of M channels, wherein the operation complexity is as follows:
θtask2=θ(W*H*N*M) (2)
according to the mean inequality, the following can be found:
therefore, when thetatask1=θtask2When theta is greater than thetatask1+θtask2When K × K is M, the total complexity of the entire convolution operation is the lowest, as can be seen from equations (1) and (2).
However, in practical applications, the number of channels M of the convolution layer output feature map is much larger than the size K × K of the convolution kernel, so the computation complexity of the point-by-point convolution is much higher than that of the deep convolution. The computation amount of information interaction between information features accounts for more than 90% of the total computation amount, a large amount of computing resources are consumed, and the speed of the neural network is reduced.
In order to solve the above problems, the present invention provides a method for accelerating a convolutional neural network, so as to equalize the computation workload among different steps in the convolutional operation, thereby effectively reducing the total computation workload and complexity of the network. Under the condition of few characteristic numbers in a mainstream network, the technical scheme of the invention can ensure that the sizes of the input characteristic diagram and the output characteristic diagram are unchanged, but the calculation amount is reduced to be less than the original 60 percent; for networks with more feature ratios, the method can reduce the actual calculation amount more, and obviously improve the efficiency of convolution operation.
The method of the present invention is an improvement over the prior art deep separable convolution and, in general, comprises: dividing the input feature map into a plurality of groups of initial feature maps, and performing first group convolution on the initial feature map groups to obtain a plurality of groups of first feature maps; and the first feature map group is subdivided into a plurality of groups of second feature maps, each group of second feature maps contains feature maps from different first feature map groups, and the second feature map group is subjected to second group convolution to obtain an input feature map. By the method, information interaction among characteristic graphs in the groups and information interaction among different characteristic graph groups can be completed, balance of computation amount and complexity among different convolution steps can be realized, and the speed of the convolutional neural network is effectively increased.
FIG. 2 illustrates a method for accelerating a convolutional neural network, in accordance with one embodiment of the present invention. As shown in fig. 2, the method includes:
and S210, averagely dividing the input feature map with N channels into G groups of initial feature maps along the channel direction, wherein each group of initial feature maps comprises S feature maps, S G2N, and performing primary group convolution on the G groups of initial feature maps to obtain G groups of first feature maps.
Group Convolution (also called Group Convolution) is to divide the input feature graph into several groups along the channel direction, and to perform Convolution operation on each Group of features and then to splice them together. Fig. 3A shows a schematic diagram of a conventional convolution operation. As shown in fig. 3A, in the conventional convolution, each convolution kernel performs convolution operation on feature maps of all channels in an input feature map, the number of the channels of the convolution kernel is the same as that of the input feature map, and the number of the convolution kernels is the same as that of the channels of an output feature map. FIG. 3B shows a schematic diagram of a group convolution operation. As shown in fig. 3B, unlike the conventional convolution, the group convolution firstly equally divides the input feature map into a plurality of groups along the channel direction, the convolution kernels are also divided correspondingly along the channel direction, each group of feature maps is convolved with each convolution kernel, and the convolved groups of feature maps are spliced to form the output feature map. For example, when the input feature map is W × H × N, if the input feature map is divided into 2 groups in the channel direction, each group includes W × H × N/2 feature maps. And correspondingly dividing the convolution kernel in the same way but keeping the size unchanged, and splicing the two groups of feature maps after convolution respectively to form an output feature map W H M. In the group convolution, each group of input characteristic graphs after grouping simultaneously execute convolution operation in parallel, so compared with the traditional convolution, the group convolution not only can reduce the number of parameters and the operation amount and improve the operation speed, but also can reduce the dependence between a convolution kernel and a front layer, thereby reducing overfitting and improving the generalization capability of a neural network.
FIG. 4A is a diagram illustrating a first group convolution of G groups of input feature maps to obtain G groups of first feature maps according to an embodiment of the present invention. As shown in fig. 4A, the input feature map of the convolutional layer has N channels (i.e., N layers, for example, N is 20), and the input feature map is divided into G groups of initial feature maps (for example, G is 4) on average along the channel direction, so that each group of initial feature map group includes S feature maps (for example, S is N/G is 5). And performing convolution on the G groups of initial feature maps for S times respectively to obtain G groups of first feature maps, wherein each group of first feature maps also comprises S feature maps. If the input feature map size is W × H, the convolution kernel size is K × K, the convolution parameter step size is 1, and the padding is 0, then the computation complexity of the first set of convolutions is:
θ′task1=θ′(G*S*W*H*S*K*K)
=θ′(N*W*H*S*K*K) (4)
wherein N, M, W, H, G, S, K are each integers greater than 1.
In another embodiment, the input features may also be divided into a plurality of non-uniform sets of initial feature maps, and the number of feature maps included in each set of initial feature maps may not be equal. For example, an input feature map having N (e.g., N-20) channels is divided into G groups (e.g., G-4) of initial feature maps along the channel direction, where the number S of feature maps in each group isiMay not be exactly equal, e.g. G1The initial feature map group contains 4 feature maps, G2The initial feature map group contains 3 feature maps, G3The initial characteristic diagram group comprises 7 characteristic diagrams, G4The group initial characteristic diagram comprises 6 characteristic diagrams, but the total number of all the characteristic diagrams in the G group initial characteristic diagram is N, namely the requirement of the total number is satisfiedRespectively Q is carried out on the G groups of initial characteristic graphsiThe sub-convolution can obtain G groups of first feature maps. The number S of the characteristic maps in each group of initial characteristic mapsiNot exactly equal, the number of convolutions Q of each set of initial feature mapsiNot exactly equal, e.g. to G1Performing 4 times of convolution operation on the group initial characteristic diagram to G2Convolving the initial feature map of the group for 3 times to the G th3The initial characteristic diagram of the group is convoluted for 7 times, and the G th4The initial characteristic maps of the group are convoluted for 6 times, and the sum of the convolution times of the initial characteristic maps of the group G is N, namely the sum of the convolution times of the initial characteristic maps of the group G meets the requirement ofLikewise, N, G, Si、QiIs an integer of 1 or more.
The first feature map group generated by performing the first group convolution operation on the input feature maps not only can quickly complete information extraction and integration in each feature map, but also can realize information interaction and analysis among different feature maps in the same group, so that each feature map in each group of first feature maps can express the whole feature information of the group. However, each feature map in the first feature map group obtained by the first group convolution operation is only associated with a certain group in the input feature map, and information of the global channel may be lost. In order to realize the interaction and integration of all information in the input feature maps, the boundary between different groups of feature maps needs to be broken, the first feature map group is divided again, each newly generated group of second feature maps contains the feature maps from different groups of first feature maps, and the second feature map group is subjected to grouping and convolution operation again.
S220, averagely dividing the G groups of first feature maps into F groups of second feature maps, wherein each group of second feature maps comprises T feature maps, so that the F-th feature mapjEach characteristic diagram in the second characteristic diagram group is respectively from different G-th characteristicsiAnd (4) grouping the first characteristic graphs, and performing second group convolution on the F groups of second characteristic graphs to obtain an output characteristic graph with M channels.
Fig. 4B is a diagram illustrating the first feature map of the G group is subdivided into the second feature maps of the F group and the second group convolution is performed according to an embodiment of the present invention. As shown in fig. 4B, G groups of first feature maps are generated after the first group convolution, and each group of first feature maps includes S feature maps; averaging the G groups of first feature maps to divide the F groups of second feature maps, wherein each group of second feature maps comprises T feature maps; and performing second convolution on the F groups of second feature maps to obtain an output feature map. When the second feature maps are divided, one feature map can be sequentially taken out from each group of first feature maps to form a group of second feature maps, and the operation is repeated until all feature maps in each group of first feature maps are taken out and contained in one group of second feature maps, and finallyFinally, F groups of second feature maps can be obtained. In this case, the number T of signatures included in each set of second signatures is the same as the number G of signatures included in each set of first signatures, i.e., T equals G, and the number F of sets of second signatures is the same as the number S of signatures included in each set of first signatures, i.e., F equals S. For example, G is obtained after the first group of convolution1、G2、G3、G44 groups of first characteristic graphs are provided, and each group of first characteristic graphs has S1、S2、S3、S4、S5A total of 5 feature maps, in turn from G1、G2、G3、G4Taking the first signature (e.g. S) from each set of signatures1) Form a first set of second profiles (e.g. F)1) From G1、G2、G3、G4Taking out a second characteristic diagram (e.g. S)2) Forming a second set of second profiles (e.g. F)2) … repeated as above until from G1、G2、G3、G4Extract the last feature map (e.g., S)5) Forming a final set of second profiles (e.g. F)5) At this time, there is F1、F2、F3、F4、F5There are 5 groups of second feature maps, and each group of second feature maps contains 4 feature maps. The second characteristic diagram of each group is processedAnd performing secondary convolution to finally obtain an output characteristic diagram with the channel number M. If the input feature map size is W × H, the convolution kernel size of the second sub-set of convolutions is 1 × 1, the step size of the convolution is 1, and the padding is 0, then the operation complexity of the second sub-set of convolutions is:
because the feature map of each newly generated second feature map group is from different first feature map groups, the feature information between different first feature map groups can be integrated by performing second group convolution operation on the second feature map group, so that each finally obtained feature map in the output feature map contains the information of all channels in the input feature map, and the complete fusion and extraction of the input features are realized.
In another embodiment, the number of feature maps from different sets of first feature maps may be different in each set of second feature maps, and the number of feature maps in each set of second feature maps may also be different, but the sum of the feature maps in all sets of second feature maps is the same as the sum of the feature maps in all sets of first feature maps. For example, G is obtained after the first group of convolution1、G2、G3、G4There are 4 groups of first feature maps, and the number of feature maps in each group of first feature maps may be different, for example, the G-th feature map1Set of first characteristic diagram has S1、S1、S3、S44 feature maps in total, item G2Set of first characteristic diagram has S1、S2、S3Total 3 feature maps, G3Set of first characteristic diagram has S1、S2、S3、S4、S5、S6、S77 feature maps in total, item G4Set of first characteristic diagram has S1、S2、S3、S4、S5、S6A total of 6 feature maps, from G1Randomly picking one of the feature maps (e.g. S) from the first feature map of the group1) From G to G2Randomly picking two feature maps (e.g. S) from the first feature map of the group1、S2) And from G3Randomly extracting three feature maps from the first feature map (e.g. S)1、S2、S3) Form a first set of second profiles (e.g. F)1) (ii) a From G1Randomly extracting three feature maps from the first feature map (e.g. S)2、S3、S4) From G to G3Randomly picking two feature maps (e.g. S) from the first feature map of the group4、S5) Forming a second set of second profiles (e.g. F)2) Repeating the steps until all the feature maps in the first feature map group are extracted to form the last second feature map group (for example, F)j) F groups of second feature maps can be obtained, and each group of second feature maps comprisesTjA characteristic diagram. Although the number of the feature maps in each group of the second feature maps can be different, the total number of the feature maps in all the groups of the second feature maps is still N, namely, the requirement of N is satisfiedWherein j is 0, …, F-1. Similarly, the number T of the characteristic maps in each group of second characteristic mapsjNot exactly equal, the number of convolutions P of each set of second feature mapsjNot exactly equal, e.g. to F1Performing convolution operation 6 times on the initial characteristic diagram group, and performing convolution operation on the F th25 times of convolution is carried out on the initial characteristic graphs of the group, and the sum of the convolution times of the second characteristic graphs of the F groups is M, namely the condition that the sum of the convolution times of the second characteristic graphs of the F groups meets the requirementLikewise, N, G, Tj、PjIs an integer of 1 or more.
Through two times of convolution, information interaction and integration among different channels in the input characteristic diagram can be completed, and finally output characteristic diagrams of M channels are obtained.
In the same way as the above formula (3),
to make theta'task1+θ′task2The value of (2) is minimum, namely the overall complexity of the convolution operation is minimum and the operation amount is minimum, in one embodiment, each group of initial feature maps comprises S feature maps, and S satisfies the following conditions:
where M is the number of channels (i.e., the number of layers) in the output feature layer of the convolutional layer, and K is the size of the convolutional kernel in the first convolutional layer. At this time of'task1=θ′task2The operation complexity of the first-order set of convolution kernels and the second-order set of convolution kernels is equal.
Considering the practical application by calculationThe obtained S' value may be a non-integer, which results in that the convolution network cannot be accelerated using the above method, for which the present invention also provides other embodiments to optimize the value of S.
In one embodiment, whenWhen the number of the input feature map channels is non-integer, S can be set to be an integer closest to S', and S is a factor of the number of the input feature map channels N.
In another embodiment, whenWhen non-integer, S can be rounded off and added toThe integer is G'. When G '. S ═ N, G ═ G', grouping the input feature graph according to G and S and carrying out first group convolution; and when G '. S & gtN, the input feature maps need to be supplemented, specifically, the first G '. S-N feature maps in the input feature maps are copied and merged with the input feature maps to obtain G '. S layer input feature maps, and then the G '. S layer input feature maps are grouped according to G ' and S, and the first group convolution is carried out.
In some cases, the second feature maps of the S groups are respectively processedWhen sub-convolvedThe value of (c) may be non-integer, thereby making it impossible to speed up the convolution network using the above-described method.
In one embodiment, whenWhen it is a non-integer, willAnd taking an integer W downwards, wherein R is M-W S, performing convolution on the R groups of second feature maps in the S groups of second feature maps for W +1 times respectively, and performing convolution on other groups for W times.
Experiments show that after the acceleration method is used, the accuracy of the convolutional network Resnet18 on an ImageNet data set finally reaches 61.8% of Top1, 83% of Top5, the accuracy is reduced by 7.5%, and the calculated amount is reduced to 15% of the original amount; the correctness of the convolution network Mobilenet on the ImageNet data set finally reaches 62.9 percent of Top1, 84.6 percent of Top5, 4.5 percent of accuracy reduction, and the calculated amount is reduced to 38 percent.
Based on the embodiment, the invention can balance the operation complexity among the steps in the convolution operation, reduce the network parameters and the whole operation amount and greatly improve the network speed and efficiency.
Although the present invention has been described in detail, those skilled in the art should understand that they can make modifications and equivalents without departing from the spirit and scope of the present invention, and they should be considered as included in the claims of the present invention.
Claims (11)
1. A method for accelerating a convolutional neural network, comprising:
step 1: dividing an input feature map with N channels into G groups of initial feature maps along the channel direction, wherein the G th feature mapiThe set of initial feature maps includes SiThe characteristic diagram is shown in the figure,g-1, performing first group convolution on the G groups of initial feature maps to obtain G groups of first feature maps, wherein N, G, SiIs an integer of 1 or more;
step 2: subdividing the G groups of first feature maps into F groups of second feature mapsMiddle, FjThe second feature map group contains T from different first feature map groupsjThe characteristic diagram is shown in the figure,performing a second group convolution on the F groups of second feature maps to obtain an output feature map with M channels, wherein F, TjAnd M is an integer of 1 or more.
2. The method of claim 1, wherein,
in the step 1, the input feature map with N channels is averagely divided into G groups of initial feature maps along the channel direction, each group of initial feature maps includes S feature maps, and S × G ═ N; and
in step 2, the G groups of first feature maps are averagely divided into F groups of second feature maps, each group of second feature maps includes T feature maps, where F is S and T is G, so that the F-th feature map isjEach characteristic diagram in the second characteristic diagram group is respectively from different G-th characteristicsiAnd (4) grouping the first characteristic graphs.
6. The method of claim 3, wherein,when non-integer, S is rounded off as S', andand when G 'is greater than N, copying the first G' S-N characteristic graphs in the input characteristic graphs and merging the characteristic graphs with the input characteristic graphs to obtain G 'S layer input characteristic graphs, and grouping according to G' and S.
7. The method of claim 3, wherein the first group convolution comprises: and performing convolution for S times on each group of initial characteristic graphs respectively.
9. The method of claim 3, wherein the second set of convolution comprises: when in useAnd when the number of the second feature maps is non-integer, taking the integer as W downwards, wherein R is M-W S, performing W +1 times of convolution on the R groups of second feature maps in the S groups of second feature maps, and performing W times of convolution on other groups of second feature maps.
10. A storage medium in which a computer program is stored which, when being executed by a processor, is operative to carry out the method of any one of claims 1-9.
11. An electronic device comprising a processor and a memory, the memory having stored therein a computer program which, when executed by the processor, is operable to carry out the method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010244305.2A CN111461144A (en) | 2020-03-31 | 2020-03-31 | Method for accelerating convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010244305.2A CN111461144A (en) | 2020-03-31 | 2020-03-31 | Method for accelerating convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111461144A true CN111461144A (en) | 2020-07-28 |
Family
ID=71680931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010244305.2A Pending CN111461144A (en) | 2020-03-31 | 2020-03-31 | Method for accelerating convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111461144A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016639B (en) * | 2020-11-02 | 2021-01-26 | 四川大学 | Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet |
-
2020
- 2020-03-31 CN CN202010244305.2A patent/CN111461144A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016639B (en) * | 2020-11-02 | 2021-01-26 | 四川大学 | Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108765247B (en) | Image processing method, device, storage medium and equipment | |
CN109063825B (en) | Convolutional neural network accelerator | |
CN108764317B (en) | Residual convolutional neural network image classification method based on multipath feature weighting | |
CN109919315B (en) | Forward reasoning method, device, equipment and storage medium of neural network | |
EP3179415A1 (en) | Systems and methods for a multi-core optimized recurrent neural network | |
CN112200300B (en) | Convolutional neural network operation method and device | |
US20200184366A1 (en) | Scheduling task graph operations | |
CN112214319B (en) | Task scheduling method for sensing computing resources | |
CN114610474B (en) | Multi-strategy job scheduling method and system under heterogeneous supercomputing environment | |
CN116416561A (en) | Video image processing method and device | |
Motamedi et al. | Fast and energy-efficient CNN inference on IoT devices | |
CN112884086A (en) | Model training method, device, equipment, storage medium and program product | |
US20220261623A1 (en) | System and method for channel-separable operations in deep neural networks | |
WO2023065983A1 (en) | Computing apparatus, neural network processing device, chip, and data processing method | |
CN111709415B (en) | Target detection method, device, computer equipment and storage medium | |
CN116450312A (en) | Scheduling strategy determination method and system for pipeline parallel training | |
WO2022040575A1 (en) | Tabular convolution and acceleration | |
EP3926546A2 (en) | Neural network model splitting method, apparatus, computer device and storage medium | |
CN111882053A (en) | Neural network model compression method based on splicing convolution | |
CN113655986B9 (en) | FFT convolution algorithm parallel implementation method and system based on NUMA affinity | |
CN111461144A (en) | Method for accelerating convolutional neural network | |
CN116915869A (en) | Cloud edge cooperation-based time delay sensitive intelligent service quick response method | |
CN110110849B (en) | Line fixed data stream mapping method based on graph segmentation | |
CN111984414A (en) | Data processing method, system, equipment and readable storage medium | |
CN115130672B (en) | Software and hardware collaborative optimization convolutional neural network calculation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200728 |
|
RJ01 | Rejection of invention patent application after publication |