CN111461144A - Method for accelerating convolutional neural network - Google Patents

Method for accelerating convolutional neural network Download PDF

Info

Publication number
CN111461144A
CN111461144A CN202010244305.2A CN202010244305A CN111461144A CN 111461144 A CN111461144 A CN 111461144A CN 202010244305 A CN202010244305 A CN 202010244305A CN 111461144 A CN111461144 A CN 111461144A
Authority
CN
China
Prior art keywords
convolution
feature maps
groups
group
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010244305.2A
Other languages
Chinese (zh)
Inventor
陈尧麟
郝昀超
张佩珩
霍志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202010244305.2A priority Critical patent/CN111461144A/en
Publication of CN111461144A publication Critical patent/CN111461144A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a method for accelerating a convolutional neural network, which comprises the following steps: step 1: dividing an input feature map with N channels into G groups of initial feature maps along the channel direction, wherein the G th feature mapiThe set of initial feature maps includes SiThe characteristic diagram is shown in the figure,
Figure DDA0002433566020000011
performing first group convolution on the G groups of initial feature maps to obtain G groups of first feature maps, wherein N, G, SiIs an integer of 1 or more; step 2: the G groups of first feature maps are subdivided into F groups of second feature maps, wherein the F th feature mapjThe second feature map group contains T from different first feature map groupsjThe characteristic diagram is shown in the figure,
Figure DDA0002433566020000012
performing a second group convolution on the F groups of second feature maps to obtain an output feature map with M channels, wherein F, TjAnd M is an integer of 1 or more.

Description

Method for accelerating convolutional neural network
Technical Field
The invention relates to a deep learning technology, in particular to a method for accelerating a convolutional neural network.
Background
With the technical development of deep learning, application techniques based on deep learning are widely applied to various fields in life. Deep learning has also evolved from the earliest cloud computing to today's terminal computing. Because most deep learning applications are large in scale, the requirements on the performance of a machine during training and prediction are high, and the storage resources and the computing power of terminal equipment are very limited, how to accelerate deep learning becomes a new technical hotspot. The convolutional neural network is a widely applied deep learning model, and the acceleration method is a hot topic.
Convolutional Neural Networks (CNN) are similar to the multilayer perceptron of an artificial Neural Network, and extract features by convolution, integrate different features, and finally make predictions. The convolutional neural network mainly comprises a data input layer, a convolutional layer, an excitation layer (activation function), a pooling layer and a full-link layer, wherein the convolutional layer is used for carrying out feature extraction on input data. The convolution operation of the convolution layer occupies a large amount of resources of the convolutional neural network, so how to reduce the amount of convolution operation becomes the key for accelerating the convolutional neural network.
The existing lightweight convolutional neural network adopts deep-separable convolution (Depth-wise Sepa rablecontolution), which decomposes the conventional convolution into two subtasks: (1) depth-wise convolution (depth-wise convolution) for performing the task of image convolution inside each feature layer; and (2) point-wise convolution (point-wise convolution) for realizing information interaction between different feature layers. Compared with the traditional method of direct convolution by a plurality of convolution kernels, the deep separable convolution can greatly reduce model parameters and calculation amount. However, in the deep separable convolution, because the algorithm complexity of the point-by-point convolution is far higher than that of the task deep convolution, a large amount of resources are used for information interaction between different feature layers, so that resource distribution is unbalanced, and the overall operation efficiency of the convolution layer is reduced.
Disclosure of Invention
The invention provides a method for accelerating a convolutional neural network, which comprises the following steps: step 1: dividing an input feature map with N channels into G groups of initial feature maps along the channel direction, wherein the G th feature mapiThe set of initial feature maps includes SiThe characteristic diagram is shown in the figure,
Figure BDA0002433564000000021
performing first group convolution on the G groups of initial feature maps to obtain G groups of first feature maps, wherein N, G, SiIs an integer of 1 or more; step 2: the G groups of first feature maps are subdivided into F groups of second feature maps, wherein the F th feature mapjThe second feature map group contains T from different first feature map groupsjThe characteristic diagram is shown in the figure,
Figure BDA0002433564000000022
performing a second group convolution on the F groups of second feature maps to obtain an output feature map with M channels, wherein F, TjAnd M is an integer of 1 or more.
Optionally, in step 1, the input feature map with N channels is averagely divided into G groups of initial feature maps along a channel direction, where each group of initial feature maps includes S feature maps, and S × G ═ N; and in the step 2, averagely dividing the G groups of first feature maps into F groups of second feature maps, wherein each group of second feature maps contains T feature maps, where F is S and T is G, so that the F isjEach characteristic diagram in the second characteristic diagram group is respectively from different G-th characteristicsiAnd (4) grouping the first characteristic graphs.
Optionally, wherein,
Figure BDA0002433564000000023
and wherein the size of the convolution kernel in the first set of convolutions is K x K and the size of the convolution kernel in the second set of convolutions is 1 x 1.
Optionally, wherein
Figure BDA0002433564000000024
And when the number of the S is a non-integer, the value of S is an integer closest to S'.
Optionally, wherein
Figure BDA0002433564000000025
When non-integer, S is rounded off as S', and
Figure BDA0002433564000000026
the integer is G ', and when G ' is S ═ N, G ═ G '.
Optionally, wherein,
Figure BDA0002433564000000027
when non-integer, S is rounded off as S', and
Figure BDA0002433564000000028
and when G 'is greater than N, copying the first G' S-N characteristic graphs in the input characteristic graphs and merging the characteristic graphs with the input characteristic graphs to obtain G 'S layer input characteristic graphs, and grouping according to G' and S.
Optionally, wherein the first group convolution comprises: and performing convolution for S times on each group of initial characteristic graphs respectively.
Optionally, wherein the second group of convolution comprises: when in use
Figure BDA0002433564000000031
When the number is an integer, each group of second characteristic graphs is respectively processed
Figure BDA0002433564000000032
And (4) performing secondary convolution.
Optionally, wherein the second group of convolution comprises: when in use
Figure BDA0002433564000000033
And when the number of the second feature maps is non-integer, taking the integer as W downwards, wherein R is M-W S, performing W +1 times of convolution on the R groups of second feature maps in the S groups of second feature maps, and performing W times of convolution on other groups of second feature maps.
A further aspect of the invention also provides a storage medium in which a computer program is stored which, when being executed by a processor, is operable to carry out any of the methods described above.
Another aspect of the present invention also provides an electronic device comprising a processor and a memory, the memory having stored therein a computer program which, when executed by the processor, is operable to carry out any of the methods described above.
Compared with the prior art, the invention has the advantages that:
according to the convolutional neural network convolution method, the determined grouping mode is adopted for the input characteristic diagram of the convolutional layer, and two groups of convolutions are carried out, so that the calculation amount among different convolution operations is more balanced, the total calculation amount and complexity of the convolutional neural network can be effectively reduced, the efficiency of the convolutional operation is obviously improved, and the network speed is increased; in addition, in some embodiments, the universality of the network acceleration method in the invention is increased by optimizing the grouping mode of the input feature map.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1A shows a schematic diagram of the deep convolution operation in a depth separable convolution. (ii) a
FIG. 1B shows a schematic diagram of a point-by-point convolution operation in a depth separable convolution;
FIG. 2 illustrates a method for accelerating a convolutional neural network, in accordance with one embodiment of the present invention;
FIG. 3A shows a schematic diagram of a conventional convolution operation;
FIG. 3B shows a schematic diagram of a group convolution operation;
FIG. 4A is a diagram illustrating a first group convolution of G groups of input feature maps to obtain G groups of first feature maps according to an embodiment of the present invention;
fig. 4B is a diagram illustrating the first feature map of the G group is subdivided into the second feature maps of the F group and the second group convolution is performed according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The prior art deep separable convolution is performed by decomposing a complete convolution operation into two steps of deep convolution and point-by-point convolution, wherein the deep convolution is filtered by using a single-channel convolution kernel on each channel of the input feature map, i.e. one convolution kernel is responsible for only one channel. FIG. 1A shows a schematic diagram of the deep convolution operation in a depth separable convolution. As shown in fig. 1A, it is assumed that the input feature map has a size W × H, the number of channels (number of layers) is N, the convolution kernel has a size K × K, and the number of convolution kernels is equal to the number of channels of the input feature map. Under the condition that the step length is 1 and the filling is 0, an output characteristic diagram with the size of W × H and the number of channels N can be obtained through deep convolution, and the operation complexity is as follows:
θtask1=θ(N*W*H*K*K) (1)
since the deep convolution only performs independent convolution operation on each channel of the input feature map, and does not effectively utilize feature information of different channels at the same spatial position, it is also necessary to perform linear combination on each channel of the deep convolution output feature map through point-by-point convolution. In point-by-point convolution, each convolution kernel is responsible for deep convolution of each channel of the output feature map. FIG. 1B shows a schematic diagram of a point-by-point convolution operation in a depth separable convolution. As shown in fig. 1B, the feature map output by the deep convolution is used as the input feature map of the point-by-point convolution, the size of the convolution kernel is 1 × N, N is the number of channels of the input feature map (that is, the deep convolution output feature map) of the point-by-point convolution, M convolution kernels are shared, and M is the number of channels of the output feature map after the point-by-point convolution. After point-by-point convolution, performing M-time linear combination on each element in the output feature graph of the deep convolution with the size of W × H and the number of channels N to complete interaction and integration of information between different channels, and finally obtaining the output feature graph of M channels, wherein the operation complexity is as follows:
θtask2=θ(W*H*N*M) (2)
according to the mean inequality, the following can be found:
Figure BDA0002433564000000041
therefore, when thetatask1=θtask2When theta is greater than thetatask1task2When K × K is M, the total complexity of the entire convolution operation is the lowest, as can be seen from equations (1) and (2).
However, in practical applications, the number of channels M of the convolution layer output feature map is much larger than the size K × K of the convolution kernel, so the computation complexity of the point-by-point convolution is much higher than that of the deep convolution. The computation amount of information interaction between information features accounts for more than 90% of the total computation amount, a large amount of computing resources are consumed, and the speed of the neural network is reduced.
In order to solve the above problems, the present invention provides a method for accelerating a convolutional neural network, so as to equalize the computation workload among different steps in the convolutional operation, thereby effectively reducing the total computation workload and complexity of the network. Under the condition of few characteristic numbers in a mainstream network, the technical scheme of the invention can ensure that the sizes of the input characteristic diagram and the output characteristic diagram are unchanged, but the calculation amount is reduced to be less than the original 60 percent; for networks with more feature ratios, the method can reduce the actual calculation amount more, and obviously improve the efficiency of convolution operation.
The method of the present invention is an improvement over the prior art deep separable convolution and, in general, comprises: dividing the input feature map into a plurality of groups of initial feature maps, and performing first group convolution on the initial feature map groups to obtain a plurality of groups of first feature maps; and the first feature map group is subdivided into a plurality of groups of second feature maps, each group of second feature maps contains feature maps from different first feature map groups, and the second feature map group is subjected to second group convolution to obtain an input feature map. By the method, information interaction among characteristic graphs in the groups and information interaction among different characteristic graph groups can be completed, balance of computation amount and complexity among different convolution steps can be realized, and the speed of the convolutional neural network is effectively increased.
FIG. 2 illustrates a method for accelerating a convolutional neural network, in accordance with one embodiment of the present invention. As shown in fig. 2, the method includes:
and S210, averagely dividing the input feature map with N channels into G groups of initial feature maps along the channel direction, wherein each group of initial feature maps comprises S feature maps, S G2N, and performing primary group convolution on the G groups of initial feature maps to obtain G groups of first feature maps.
Group Convolution (also called Group Convolution) is to divide the input feature graph into several groups along the channel direction, and to perform Convolution operation on each Group of features and then to splice them together. Fig. 3A shows a schematic diagram of a conventional convolution operation. As shown in fig. 3A, in the conventional convolution, each convolution kernel performs convolution operation on feature maps of all channels in an input feature map, the number of the channels of the convolution kernel is the same as that of the input feature map, and the number of the convolution kernels is the same as that of the channels of an output feature map. FIG. 3B shows a schematic diagram of a group convolution operation. As shown in fig. 3B, unlike the conventional convolution, the group convolution firstly equally divides the input feature map into a plurality of groups along the channel direction, the convolution kernels are also divided correspondingly along the channel direction, each group of feature maps is convolved with each convolution kernel, and the convolved groups of feature maps are spliced to form the output feature map. For example, when the input feature map is W × H × N, if the input feature map is divided into 2 groups in the channel direction, each group includes W × H × N/2 feature maps. And correspondingly dividing the convolution kernel in the same way but keeping the size unchanged, and splicing the two groups of feature maps after convolution respectively to form an output feature map W H M. In the group convolution, each group of input characteristic graphs after grouping simultaneously execute convolution operation in parallel, so compared with the traditional convolution, the group convolution not only can reduce the number of parameters and the operation amount and improve the operation speed, but also can reduce the dependence between a convolution kernel and a front layer, thereby reducing overfitting and improving the generalization capability of a neural network.
FIG. 4A is a diagram illustrating a first group convolution of G groups of input feature maps to obtain G groups of first feature maps according to an embodiment of the present invention. As shown in fig. 4A, the input feature map of the convolutional layer has N channels (i.e., N layers, for example, N is 20), and the input feature map is divided into G groups of initial feature maps (for example, G is 4) on average along the channel direction, so that each group of initial feature map group includes S feature maps (for example, S is N/G is 5). And performing convolution on the G groups of initial feature maps for S times respectively to obtain G groups of first feature maps, wherein each group of first feature maps also comprises S feature maps. If the input feature map size is W × H, the convolution kernel size is K × K, the convolution parameter step size is 1, and the padding is 0, then the computation complexity of the first set of convolutions is:
θ′task1=θ′(G*S*W*H*S*K*K)
=θ′(N*W*H*S*K*K) (4)
wherein N, M, W, H, G, S, K are each integers greater than 1.
In another embodiment, the input features may also be divided into a plurality of non-uniform sets of initial feature maps, and the number of feature maps included in each set of initial feature maps may not be equal. For example, an input feature map having N (e.g., N-20) channels is divided into G groups (e.g., G-4) of initial feature maps along the channel direction, where the number S of feature maps in each group isiMay not be exactly equal, e.g. G1The initial feature map group contains 4 feature maps, G2The initial feature map group contains 3 feature maps, G3The initial characteristic diagram group comprises 7 characteristic diagrams, G4The group initial characteristic diagram comprises 6 characteristic diagrams, but the total number of all the characteristic diagrams in the G group initial characteristic diagram is N, namely the requirement of the total number is satisfied
Figure BDA0002433564000000061
Respectively Q is carried out on the G groups of initial characteristic graphsiThe sub-convolution can obtain G groups of first feature maps. The number S of the characteristic maps in each group of initial characteristic mapsiNot exactly equal, the number of convolutions Q of each set of initial feature mapsiNot exactly equal, e.g. to G1Performing 4 times of convolution operation on the group initial characteristic diagram to G2Convolving the initial feature map of the group for 3 times to the G th3The initial characteristic diagram of the group is convoluted for 7 times, and the G th4The initial characteristic maps of the group are convoluted for 6 times, and the sum of the convolution times of the initial characteristic maps of the group G is N, namely the sum of the convolution times of the initial characteristic maps of the group G meets the requirement of
Figure BDA0002433564000000062
Likewise, N, G, Si、QiIs an integer of 1 or more.
The first feature map group generated by performing the first group convolution operation on the input feature maps not only can quickly complete information extraction and integration in each feature map, but also can realize information interaction and analysis among different feature maps in the same group, so that each feature map in each group of first feature maps can express the whole feature information of the group. However, each feature map in the first feature map group obtained by the first group convolution operation is only associated with a certain group in the input feature map, and information of the global channel may be lost. In order to realize the interaction and integration of all information in the input feature maps, the boundary between different groups of feature maps needs to be broken, the first feature map group is divided again, each newly generated group of second feature maps contains the feature maps from different groups of first feature maps, and the second feature map group is subjected to grouping and convolution operation again.
S220, averagely dividing the G groups of first feature maps into F groups of second feature maps, wherein each group of second feature maps comprises T feature maps, so that the F-th feature mapjEach characteristic diagram in the second characteristic diagram group is respectively from different G-th characteristicsiAnd (4) grouping the first characteristic graphs, and performing second group convolution on the F groups of second characteristic graphs to obtain an output characteristic graph with M channels.
Fig. 4B is a diagram illustrating the first feature map of the G group is subdivided into the second feature maps of the F group and the second group convolution is performed according to an embodiment of the present invention. As shown in fig. 4B, G groups of first feature maps are generated after the first group convolution, and each group of first feature maps includes S feature maps; averaging the G groups of first feature maps to divide the F groups of second feature maps, wherein each group of second feature maps comprises T feature maps; and performing second convolution on the F groups of second feature maps to obtain an output feature map. When the second feature maps are divided, one feature map can be sequentially taken out from each group of first feature maps to form a group of second feature maps, and the operation is repeated until all feature maps in each group of first feature maps are taken out and contained in one group of second feature maps, and finallyFinally, F groups of second feature maps can be obtained. In this case, the number T of signatures included in each set of second signatures is the same as the number G of signatures included in each set of first signatures, i.e., T equals G, and the number F of sets of second signatures is the same as the number S of signatures included in each set of first signatures, i.e., F equals S. For example, G is obtained after the first group of convolution1、G2、G3、G44 groups of first characteristic graphs are provided, and each group of first characteristic graphs has S1、S2、S3、S4、S5A total of 5 feature maps, in turn from G1、G2、G3、G4Taking the first signature (e.g. S) from each set of signatures1) Form a first set of second profiles (e.g. F)1) From G1、G2、G3、G4Taking out a second characteristic diagram (e.g. S)2) Forming a second set of second profiles (e.g. F)2) … repeated as above until from G1、G2、G3、G4Extract the last feature map (e.g., S)5) Forming a final set of second profiles (e.g. F)5) At this time, there is F1、F2、F3、F4、F5There are 5 groups of second feature maps, and each group of second feature maps contains 4 feature maps. The second characteristic diagram of each group is processed
Figure BDA0002433564000000071
And performing secondary convolution to finally obtain an output characteristic diagram with the channel number M. If the input feature map size is W × H, the convolution kernel size of the second sub-set of convolutions is 1 × 1, the step size of the convolution is 1, and the padding is 0, then the operation complexity of the second sub-set of convolutions is:
Figure BDA0002433564000000081
because the feature map of each newly generated second feature map group is from different first feature map groups, the feature information between different first feature map groups can be integrated by performing second group convolution operation on the second feature map group, so that each finally obtained feature map in the output feature map contains the information of all channels in the input feature map, and the complete fusion and extraction of the input features are realized.
In another embodiment, the number of feature maps from different sets of first feature maps may be different in each set of second feature maps, and the number of feature maps in each set of second feature maps may also be different, but the sum of the feature maps in all sets of second feature maps is the same as the sum of the feature maps in all sets of first feature maps. For example, G is obtained after the first group of convolution1、G2、G3、G4There are 4 groups of first feature maps, and the number of feature maps in each group of first feature maps may be different, for example, the G-th feature map1Set of first characteristic diagram has S1、S1、S3、S44 feature maps in total, item G2Set of first characteristic diagram has S1、S2、S3Total 3 feature maps, G3Set of first characteristic diagram has S1、S2、S3、S4、S5、S6、S77 feature maps in total, item G4Set of first characteristic diagram has S1、S2、S3、S4、S5、S6A total of 6 feature maps, from G1Randomly picking one of the feature maps (e.g. S) from the first feature map of the group1) From G to G2Randomly picking two feature maps (e.g. S) from the first feature map of the group1、S2) And from G3Randomly extracting three feature maps from the first feature map (e.g. S)1、S2、S3) Form a first set of second profiles (e.g. F)1) (ii) a From G1Randomly extracting three feature maps from the first feature map (e.g. S)2、S3、S4) From G to G3Randomly picking two feature maps (e.g. S) from the first feature map of the group4、S5) Forming a second set of second profiles (e.g. F)2) Repeating the steps until all the feature maps in the first feature map group are extracted to form the last second feature map group (for example, F)j) F groups of second feature maps can be obtained, and each group of second feature maps comprisesTjA characteristic diagram. Although the number of the feature maps in each group of the second feature maps can be different, the total number of the feature maps in all the groups of the second feature maps is still N, namely, the requirement of N is satisfied
Figure BDA0002433564000000082
Wherein j is 0, …, F-1. Similarly, the number T of the characteristic maps in each group of second characteristic mapsjNot exactly equal, the number of convolutions P of each set of second feature mapsjNot exactly equal, e.g. to F1Performing convolution operation 6 times on the initial characteristic diagram group, and performing convolution operation on the F th25 times of convolution is carried out on the initial characteristic graphs of the group, and the sum of the convolution times of the second characteristic graphs of the F groups is M, namely the condition that the sum of the convolution times of the second characteristic graphs of the F groups meets the requirement
Figure BDA0002433564000000083
Likewise, N, G, Tj、PjIs an integer of 1 or more.
Through two times of convolution, information interaction and integration among different channels in the input characteristic diagram can be completed, and finally output characteristic diagrams of M channels are obtained.
In the same way as the above formula (3),
Figure BDA0002433564000000091
to make theta'task1+θ′task2The value of (2) is minimum, namely the overall complexity of the convolution operation is minimum and the operation amount is minimum, in one embodiment, each group of initial feature maps comprises S feature maps, and S satisfies the following conditions:
Figure BDA0002433564000000092
where M is the number of channels (i.e., the number of layers) in the output feature layer of the convolutional layer, and K is the size of the convolutional kernel in the first convolutional layer. At this time of'task1=θ′task2The operation complexity of the first-order set of convolution kernels and the second-order set of convolution kernels is equal.
Considering the practical application by calculation
Figure BDA0002433564000000093
The obtained S' value may be a non-integer, which results in that the convolution network cannot be accelerated using the above method, for which the present invention also provides other embodiments to optimize the value of S.
In one embodiment, when
Figure BDA0002433564000000094
When the number of the input feature map channels is non-integer, S can be set to be an integer closest to S', and S is a factor of the number of the input feature map channels N.
In another embodiment, when
Figure BDA0002433564000000095
When non-integer, S can be rounded off and added to
Figure BDA0002433564000000096
The integer is G'. When G '. S ═ N, G ═ G', grouping the input feature graph according to G and S and carrying out first group convolution; and when G '. S & gtN, the input feature maps need to be supplemented, specifically, the first G '. S-N feature maps in the input feature maps are copied and merged with the input feature maps to obtain G '. S layer input feature maps, and then the G '. S layer input feature maps are grouped according to G ' and S, and the first group convolution is carried out.
In some cases, the second feature maps of the S groups are respectively processed
Figure BDA0002433564000000097
When sub-convolved
Figure BDA0002433564000000098
The value of (c) may be non-integer, thereby making it impossible to speed up the convolution network using the above-described method.
In one embodiment, when
Figure BDA0002433564000000099
When it is a non-integer, will
Figure BDA00024335640000000910
And taking an integer W downwards, wherein R is M-W S, performing convolution on the R groups of second feature maps in the S groups of second feature maps for W +1 times respectively, and performing convolution on other groups for W times.
Experiments show that after the acceleration method is used, the accuracy of the convolutional network Resnet18 on an ImageNet data set finally reaches 61.8% of Top1, 83% of Top5, the accuracy is reduced by 7.5%, and the calculated amount is reduced to 15% of the original amount; the correctness of the convolution network Mobilenet on the ImageNet data set finally reaches 62.9 percent of Top1, 84.6 percent of Top5, 4.5 percent of accuracy reduction, and the calculated amount is reduced to 38 percent.
Based on the embodiment, the invention can balance the operation complexity among the steps in the convolution operation, reduce the network parameters and the whole operation amount and greatly improve the network speed and efficiency.
Although the present invention has been described in detail, those skilled in the art should understand that they can make modifications and equivalents without departing from the spirit and scope of the present invention, and they should be considered as included in the claims of the present invention.

Claims (11)

1. A method for accelerating a convolutional neural network, comprising:
step 1: dividing an input feature map with N channels into G groups of initial feature maps along the channel direction, wherein the G th feature mapiThe set of initial feature maps includes SiThe characteristic diagram is shown in the figure,
Figure FDA0002433563990000011
g-1, performing first group convolution on the G groups of initial feature maps to obtain G groups of first feature maps, wherein N, G, SiIs an integer of 1 or more;
step 2: subdividing the G groups of first feature maps into F groups of second feature mapsMiddle, FjThe second feature map group contains T from different first feature map groupsjThe characteristic diagram is shown in the figure,
Figure FDA0002433563990000012
performing a second group convolution on the F groups of second feature maps to obtain an output feature map with M channels, wherein F, TjAnd M is an integer of 1 or more.
2. The method of claim 1, wherein,
in the step 1, the input feature map with N channels is averagely divided into G groups of initial feature maps along the channel direction, each group of initial feature maps includes S feature maps, and S × G ═ N; and
in step 2, the G groups of first feature maps are averagely divided into F groups of second feature maps, each group of second feature maps includes T feature maps, where F is S and T is G, so that the F-th feature map isjEach characteristic diagram in the second characteristic diagram group is respectively from different G-th characteristicsiAnd (4) grouping the first characteristic graphs.
3. The method of claim 2, wherein,
Figure FDA0002433563990000013
and the convolution kernel in the first convolution group is K x K in size, and the convolution kernel in the second convolution group is 1 x 1 in size.
4. The method of claim 3, wherein the step of treating the patient is carried out while treating the patient
Figure FDA0002433563990000014
And when the number of the S is a non-integer, the value of S is an integer closest to S'.
5. The method of claim 3, wherein the step of treating the patient is carried out while treating the patient
Figure FDA0002433563990000021
When non-integer, S is rounded off as S', and
Figure FDA0002433563990000022
the integer is G ', and when G ' is S ═ N, G ═ G '.
6. The method of claim 3, wherein,
Figure FDA0002433563990000023
when non-integer, S is rounded off as S', and
Figure FDA0002433563990000024
and when G 'is greater than N, copying the first G' S-N characteristic graphs in the input characteristic graphs and merging the characteristic graphs with the input characteristic graphs to obtain G 'S layer input characteristic graphs, and grouping according to G' and S.
7. The method of claim 3, wherein the first group convolution comprises: and performing convolution for S times on each group of initial characteristic graphs respectively.
8. The method of claim 3, wherein the second set of convolution comprises: when in use
Figure FDA0002433563990000025
When the number is an integer, each group of second characteristic graphs is respectively processed
Figure FDA0002433563990000026
And (4) performing secondary convolution.
9. The method of claim 3, wherein the second set of convolution comprises: when in use
Figure FDA0002433563990000027
And when the number of the second feature maps is non-integer, taking the integer as W downwards, wherein R is M-W S, performing W +1 times of convolution on the R groups of second feature maps in the S groups of second feature maps, and performing W times of convolution on other groups of second feature maps.
10. A storage medium in which a computer program is stored which, when being executed by a processor, is operative to carry out the method of any one of claims 1-9.
11. An electronic device comprising a processor and a memory, the memory having stored therein a computer program which, when executed by the processor, is operable to carry out the method of any of claims 1-9.
CN202010244305.2A 2020-03-31 2020-03-31 Method for accelerating convolutional neural network Pending CN111461144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010244305.2A CN111461144A (en) 2020-03-31 2020-03-31 Method for accelerating convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010244305.2A CN111461144A (en) 2020-03-31 2020-03-31 Method for accelerating convolutional neural network

Publications (1)

Publication Number Publication Date
CN111461144A true CN111461144A (en) 2020-07-28

Family

ID=71680931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010244305.2A Pending CN111461144A (en) 2020-03-31 2020-03-31 Method for accelerating convolutional neural network

Country Status (1)

Country Link
CN (1) CN111461144A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016639B (en) * 2020-11-02 2021-01-26 四川大学 Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016639B (en) * 2020-11-02 2021-01-26 四川大学 Flexible separable convolution framework and feature extraction method and application thereof in VGG and ResNet

Similar Documents

Publication Publication Date Title
CN108765247B (en) Image processing method, device, storage medium and equipment
CN109063825B (en) Convolutional neural network accelerator
CN108764317B (en) Residual convolutional neural network image classification method based on multipath feature weighting
CN109919315B (en) Forward reasoning method, device, equipment and storage medium of neural network
EP3179415A1 (en) Systems and methods for a multi-core optimized recurrent neural network
CN112200300B (en) Convolutional neural network operation method and device
US20200184366A1 (en) Scheduling task graph operations
CN112214319B (en) Task scheduling method for sensing computing resources
CN114610474B (en) Multi-strategy job scheduling method and system under heterogeneous supercomputing environment
CN116416561A (en) Video image processing method and device
Motamedi et al. Fast and energy-efficient CNN inference on IoT devices
CN112884086A (en) Model training method, device, equipment, storage medium and program product
US20220261623A1 (en) System and method for channel-separable operations in deep neural networks
WO2023065983A1 (en) Computing apparatus, neural network processing device, chip, and data processing method
CN111709415B (en) Target detection method, device, computer equipment and storage medium
CN116450312A (en) Scheduling strategy determination method and system for pipeline parallel training
WO2022040575A1 (en) Tabular convolution and acceleration
EP3926546A2 (en) Neural network model splitting method, apparatus, computer device and storage medium
CN111882053A (en) Neural network model compression method based on splicing convolution
CN113655986B9 (en) FFT convolution algorithm parallel implementation method and system based on NUMA affinity
CN111461144A (en) Method for accelerating convolutional neural network
CN116915869A (en) Cloud edge cooperation-based time delay sensitive intelligent service quick response method
CN110110849B (en) Line fixed data stream mapping method based on graph segmentation
CN111984414A (en) Data processing method, system, equipment and readable storage medium
CN115130672B (en) Software and hardware collaborative optimization convolutional neural network calculation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200728

RJ01 Rejection of invention patent application after publication