CN112836757A - Deep learning network convolution kernel internal parameter sharing method - Google Patents
Deep learning network convolution kernel internal parameter sharing method Download PDFInfo
- Publication number
- CN112836757A CN112836757A CN202110177124.7A CN202110177124A CN112836757A CN 112836757 A CN112836757 A CN 112836757A CN 202110177124 A CN202110177124 A CN 202110177124A CN 112836757 A CN112836757 A CN 112836757A
- Authority
- CN
- China
- Prior art keywords
- sharing
- neural network
- parameters
- parameter
- convolution kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for sharing internal parameters of a convolution kernel of a deep learning network, which comprises the following steps: (1) pre-training the network; (2) clustering the convolution input channels of each layer of the network in the step (1) by using a Kmeans clustering function; (3) sharing a two-dimensional matrix parameter for the input channels belonging to the class in (2); (4) and retraining the network to improve the prediction result. The invention can reduce the parameter and the calculated amount of the convolution neural network by more than 40 percent on the premise of basically having no influence on the precision, quickens the reasoning speed of the neural network, and can even improve the reasoning accuracy of the test set of the network under the condition of low sharing rate, so that the accuracy of the test set exceeds the original neural network which is not optimized by the method under the same training set and the same training batch.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a method for sharing internal parameters of a convolution kernel of a deep learning network.
Background
A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the coverage of surrounding cells, and performs well for large image processing. Because of the invariance of image translation, scaling and rotation, the convolutional neural network is widely applied to the field of image recognition, Microsoft utilizes the convolutional neural network as a handwriting recognition system for Arabic and Chinese, Google uses the convolutional neural network to recognize faces and license plates in street view pictures, and the like.
The development of convolutional neural networks is rapid, the identification accuracy is gradually improved, and the huge parameter and the calculation amount cost are brought along. For example: the winner of the ImageNet challenge increased the classification accuracy from 84.7% in 2012 (AlexNet) to 96.5% in 2015 (ResNet-152), but the computational load was from 1.4X 1010FLOPs were proliferated to 2.26X 1011FLOPs, the traditional CPU, are a booming proposition for such a large network, and only GPUs with high computing power enable the network to be trained relatively quickly. But high performance GPUs necessarily involve significant power consumption, and the problem of heat dissipation is a challenge for embedded devices. Therefore, reducing the number of parameters and the calculation amount of the model becomes an urgent problem for the application of the neural network.
In order to reduce the parameters and the calculation amount of the network, researchers have proposed a plurality of compression methods of neural network models, which are mainly classified into four types: (1) parameter pruning and sharing; (2) low-rank factorization; (3) a transfer/compact convolution filter; (4) knowledge distillation. Where parameter pruning and sharing was originally addressed to the overfitting problem, it is now more used to reduce network complexity. The traditional parameter sharing has higher precision loss of issuing. While networks of pruning meetings are often difficult to train and have precision loss, the invention proposed herein further improves the efficiency of parameter sharing mainly by convolution and internal parameter sharing.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for sharing the internal parameters of the convolution kernel, which changes the structure of a neural network, reduces the calculated amount of the network, compresses the network parameters and further improves the reasoning speed.
In order to solve the above problems, the present invention provides a method for sharing internal parameters of a deep learning network convolution kernel, comprising the following steps:
a method for sharing internal parameters of a deep learning network convolution kernel comprises the following steps:
step 1: calling a clustering method to cluster two-dimensional matrix parameters corresponding to input channels of each convolution kernel of a neural network model to be optimized, and dividing the two-dimensional matrix parameters with the same characteristics into one type;
step 2: enabling the same kind of two-dimensional matrix parameters to share a new two-dimensional matrix parameter; by sharing the reduced parameters and the calculated amount, the total number of input channels minus the number of the cluster categories is the reduced parameter amount;
and step 3: designing a new neural network model according to the new two-dimensional matrix parameters;
and 4, step 4: in the new neural network model, a matrix dot-product distribution law is applied to change the original calculation mode; the input channels sharing one parameter are subjected to a method of adding first and then matrix multiplication, if n two-dimensional parameters exist in one category of the cluster, addition operation and multiplication operation are required for (n-1) times, and the multiplication operation is reduced for (n-1) times compared with an original convolution method;
and 5: training a new neural network model, solving the gradient of each neuron, and updating the shared weight parameter; and repeatedly adjusting the shared parameters until a preset accuracy is reached.
Further, before step 1, the neural network model to be optimized is pre-trained, so that the accuracy of the network to be optimized in the test set reaches a higher level. Before clustering, the network is trained to a level with higher accuracy, and clustering sharing is performed on the basis, so that the accuracy is less influenced.
Furthermore, each parameter corresponding to the input channel of the convolution kernel is clustered by calling Kmeans clustering or other clustering methods, and the Kmeans is an unsupervised clustering algorithm. The clustering effect is excellent, and the convergence rate is high. For a given sample set, samples are divided into K clusters according to the distance between the samples, so that points in the clusters are connected together as closely as possible, and the distance between the clusters is as large as possible. Parameters with the same characteristics clustered by Kmeans are grouped into a class, and input channels of the class share one parameter.
Further, step 2, the parameters of the two-dimensional matrix before sharing are expressed asWherein wiRepresents the weight of the ith layer, L represents the number of layers of the convolutional neural network, NiRepresenting the number of input channels of the ith layer; according to the clustering, for the weight parameter wiAny one of the convolution kernels w ofnK clusters were obtained: cn=C1,C2,…CkEach cluster Cj1 ≦ j ≦ k, where the parameters included in the equation are the parameters that need to be optimally shared; the sharing method comprises the following steps:
Wi=f(Cn,wi)for 1≦n≦Ni+1 (2)
wherein WnIs an optimized convolution kernel, WiIs the optimized weight parameter of the ith layer, and w is the parameter in the jth cluster.
Further, in step 4, when a new neural network model is trained, all input feature maps sharing the same parameter are added first and then convolved by using a distribution law of matrix dot product, and then the formula is:
wherein Wa×b(a<Ni+1,b<Ni) Is the ith layer a channel corresponding to the b input feature mapDimension matrix parameter, NiIs the number of input channels of the i-th layerIs an input to the computer system that is,is the output.
Further, the neural network model to be optimized is pre-trained, and the new neural network model is trained to be fine-tuned by adopting a back propagation algorithm.
The invention has the beneficial effects that: the invention provides a method for sharing internal parameters of a deep learning network convolution kernel, which is characterized in that two or more input channels with the same characteristics share one parameter by clustering and sharing the input channels with the same characteristics, and the operation method of the convolution kernel is improved, input pictures sharing one parameter can be added firstly and then multiplied by a weight parameter point, so that the parameter number and the calculated amount are greatly reduced under the condition of ensuring the calculation accuracy. The cifar10, cifar100, Imagenet dataset were tested experimentally with VGG16 and Resnet 50.
The invention can be used in various mobile embedded devices, is especially suitable for the automatic driving technology, uses the GPU to reason about pictures with the size of 1920 multiplied by 1024 in normal automatic driving input, can only reason about 11-20 pictures per second, which is not allowed for occasions needing quick reasoning, the invention provides a method for sharing the internal parameters of the deep learning network convolution kernel, which can reduce the calculated amount by over 50 percent, can ensure that the number of reasoning pictures per second reaches over 90 after optimization, basically meets the requirements of the automatic driving technology, and since the use of the distribution law of step 4 of the present invention can greatly reduce the multiplication, we know that, for ASIC chip, the area and power consumption of multiplier are far greater than that of addition tree, so the invention can reduce multiplication times and has great application value for designing AI chip.
Table 1 lists the number of pictures inferred in each second on different networks before and after the method for sharing the internal parameters of the convolutional kernel of the learning network with different GPU application degrees.
Drawings
FIG. 1 is a schematic diagram of convolution kernel internal sharing.
Fig. 2 is a diagram illustrating the use of convolution kernel internal sharing for VGG 13.
Fig. 3 is a diagram of a network architecture employing internal sharing of convolution kernels.
FIG. 4 is a diagram illustrating the operation of sharing the interior of a convolution kernel.
Fig. 5 is a graph of the results shared by VGG 16.
Fig. 6 is a graph of the results shared by Resnet 50.
Fig. 7 is a diagram of the middle features when Rresnet50 is not shared.
Fig. 8 is a graph of the intermediate characteristics when the Resnet50 sharing rate is 0.7.
Detailed Description
A deep learning network convolution kernel internal parameter sharing method comprises the following steps:
(1) selecting a neural network model to be optimized (taking VGG13, VGG16 and Resnet50 as examples in the invention), and pre-training the neural network model to be optimized, so that the accuracy of the network to be optimized in a test set reaches a higher level.
(2) And (3) clustering corresponding parameters of input channels of the convolution kernels by calling Kmeans clustering or other clustering methods, wherein Kmeans is an unsupervised clustering algorithm. The clustering effect is excellent, and the convergence rate is high. For a given sample set, samples are divided into K clusters according to the distance between the samples, so that points in the clusters are connected together as closely as possible, and the distance between the clusters is as large as possible. Parameters with the same characteristics clustered by Kmeans are grouped into a class, and input channels of the class share one parameter.
(3) Sharing one kind of parameters, sharing and optimizing, then sharing one two-dimensional matrix parameter by 2 or more input channels belonging to one kind, wherein the shared parameter value is the arithmetic mean value of the parameters corresponding to the model to be optimized, designing a new neural network model according to the shared result, and in the new neural network, utilizing the distribution law of matrix dot product, belonging to all input characteristic graphs sharing the same parameter.
(4) And (4) training a new neural network model by adopting a back propagation algorithm, solving the gradient of each neuron, and updating the shared weight parameter. And repeatedly adjusting the shared parameters until a higher accuracy is achieved. The goal of achieving higher accuracy is to control the accuracy degradation to within 1% compared to the original network.
The invention provides a method for sharing internal parameters of a convolution kernel of a deep learning network. As can be seen from fig. 1, the parameter sharing method inside the convolution kernel of the convolutional neural network greatly reduces the connection between neurons, thereby greatly reducing the amount of computation.
As shown in fig. 2, the network structures of VGG and Resnet are not changed, the operation mode of the convolutional layer is changed, for the output of the previous convolutional layer, the input channel parameters of the convolutional cores are clustered by using a clustering function, the input channels corresponding to the parameters belonging to one class are added first and then subjected to point multiplication, finally the output of each convolutional core is subjected to concat, and the obtained output is input to the next layer through a relu function after being subjected to batch normalization.
VGG13 applying convolution neural network convolution kernel Internal parameter sharing is shown in FIG. 3, the invention uses a pytorech to build a network to train CIFAR-10, CIFAR100 and ImageNet data sets, taking the CIFAR-10 data set as an example, CIFAR-10 data sets are all three-channel color pictures with the image size of 32 x 32, because the accuracy is greatly influenced by internally sharing the first convolution layer and the convolution kernel of 1 x 1, the first convolution layer kernel and the convolution layer with the convolution kernel size of 1 x 1 use the classical convolution, and the rest use convolution kernel internally sharing convolution (Internal shared convolution). The specific operation steps are as follows: first, the input image size is: 32 x 3, passing through a standard convolution with 3 x 3, step size 1, filling 1, output channel 64, then performing a convolution kernel with 3 x 3, step size 1, filling 1, kernel sharing convolution, pooling, batch normalization, and entering the second stage. The second stage comprises 2 intra-core shared convolutions, output channels are all 128, step length is provided, filling is 1, the third stage enters through pooling and batch normalization, the third stage comprises 3 convolution output channels are all 128, the first two shared convolution kernels are 3 x 3, the last standard convolution kernel is 1 x 1, and the fourth stage enters through pooling and batch normalization. The fourth stage comprises 2 3 × 3 internal shared convolution kernels, namely 1 × 1 standard convolution, 256 output channels, 1 step length and 1 filling, and the fifth stage is entered through pooling and batch normalization. The fifth stage consists of 2 3 x 3 internal shared convolution kernels, one 1 x 1 standard convolution, 512 output channels, 1 step size, and 1 fill. Then output through 3 fully connected layers. And finishing image classification.
The operation method for internal sharing of convolution kernel is shown in fig. 4, and the steps are as follows:
(1) pre-training the network for 300 rounds;
(2) carrying out convolution kernel internal sharing operation on the network parameters;
(3) fine adjustment for 3 rounds;
(4) and testing the sharing result.
Wherein (2), (3) and (4) are repeated continuously. The deep learning CNN network improvement method based on the deep learning CNN network provided by the invention has the advantages that the internal parameter sharing of the convolution kernel greatly reduces the calculated amount and the parameter amount, compared with the classical neural network, the used parameters are less, the calculated amount is lower, and the deep learning CNN network convolution kernel internal parameter sharing method provided by the invention is effective. The results obtained for VGG16 are shown in Table 1, and the results obtained for Resnet50 are shown in Table 2.
TABLE 1 VGG16 calculation results
TABLE 2 Resnet50 calculation results
In addition, the invention also provides an intermediate characteristic diagram (see the attached drawing) of the Resnet50 processing CIFAR10, wherein FIG. 7 is the Resnet50 intermediate characteristic diagram without sharing optimization, FIG. 8 is the intermediate characteristic diagram when the sharing rate is 0.7, and the sharing optimization is less harmful to the characteristic extraction by comparing the two diagrams.
Through the data, the advantage of the internal sharing of the convolution kernel is obvious compared with the classical convolution, the parameter quantity of the VGG convolution operation can be reduced by more than 80%, the parameter computation quantity can be reduced by nearly 80%, both the Resnet parameter quantity and the computation quantity can be reduced by more than 40%, the accuracy loss is small, and even under the condition of low sharing rate, the test integration performance of the model is even better than that of an unoptimized model.
The above is only a preferred embodiment of the present invention, it should be noted that the above embodiment does not limit the present invention, and various changes and modifications made by workers within the scope of the technical idea of the present invention fall within the protection scope of the present invention.
Claims (5)
1. A method for sharing internal parameters of a deep learning network convolution kernel is characterized by comprising the following steps:
step 1: calling a clustering method to cluster two-dimensional matrix parameters corresponding to input channels of each convolution kernel of a neural network model to be optimized, and dividing the two-dimensional matrix parameters with the same characteristics into one type;
step 2: enabling the same kind of two-dimensional matrix parameters to share a new two-dimensional matrix parameter;
and step 3: designing a new neural network model according to the new two-dimensional matrix parameters;
and 4, step 4: in the new neural network model, a matrix dot-product distribution law is applied to change the original calculation mode;
and 5: training a new neural network model, solving the gradient of each neuron, and updating the shared weight parameter; and repeatedly adjusting the shared parameters until a preset accuracy is reached.
2. The method for sharing the internal parameters of the deep learning network convolution kernel according to claim 1, wherein a neural network model to be optimized is pre-trained before step 1.
3. The method for sharing the internal parameters of the deep learning network convolution kernel as claimed in claim 1, wherein in step 2, the parameters of the two-dimensional matrix before sharing are expressed asWherein wiRepresents the weight of the ith layer, L represents the number of layers of the convolutional neural network, NiRepresenting the number of input channels of the ith layer; according to the clustering, for the weight parameter wiAny one of the convolution kernels w ofnK clusters were obtained: cn=C1,C2,…CkEach cluster Cj1 ≦ j ≦ k, where the parameters included in the equation are the parameters that need to be optimally shared; the sharing method comprises the following steps:
Wi=f(Cn,wi) for 1≦n≦Ni+1 (2)
wherein WnIs an optimized convolution kernel, WiIs the optimized weight parameter of the ith layer, and w is the parameter in the jth cluster.
4. The method as claimed in claim 1, wherein in step 4, when training a new neural network model, all input feature maps sharing the same parameter are added first and then convolved by using a distribution law of matrix dot product, and the formula is as follows:
5. The method for sharing the internal parameters of the deep learning network convolution kernel according to claim 2, wherein a back propagation algorithm is adopted for both pre-training the neural network model to be optimized and training a new neural network model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110177124.7A CN112836757A (en) | 2021-02-09 | 2021-02-09 | Deep learning network convolution kernel internal parameter sharing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110177124.7A CN112836757A (en) | 2021-02-09 | 2021-02-09 | Deep learning network convolution kernel internal parameter sharing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112836757A true CN112836757A (en) | 2021-05-25 |
Family
ID=75933028
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110177124.7A Pending CN112836757A (en) | 2021-02-09 | 2021-02-09 | Deep learning network convolution kernel internal parameter sharing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112836757A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705784A (en) * | 2021-08-20 | 2021-11-26 | 江南大学 | Neural network weight coding method based on matrix sharing and hardware system |
-
2021
- 2021-02-09 CN CN202110177124.7A patent/CN112836757A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705784A (en) * | 2021-08-20 | 2021-11-26 | 江南大学 | Neural network weight coding method based on matrix sharing and hardware system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110852439B (en) | Data processing method and device and storage medium | |
CN106250939B (en) | Handwritten character recognition method based on FPGA + ARM multilayer convolutional neural network | |
EP3407266A1 (en) | Artificial neural network calculating device and method for sparse connection | |
CN111095302A (en) | Compression of sparse deep convolutional network weights | |
WO2021051987A1 (en) | Method and apparatus for training neural network model | |
CN109063719B (en) | Image classification method combining structure similarity and class information | |
CN114488140B (en) | Small sample radar one-dimensional image target recognition method based on deep migration learning | |
CN113326930B (en) | Data processing method, neural network training method, related device and equipment | |
CN105488563A (en) | Deep learning oriented sparse self-adaptive neural network, algorithm and implementation device | |
WO2023231794A1 (en) | Neural network parameter quantification method and apparatus | |
CN112183742A (en) | Neural network hybrid quantization method based on progressive quantization and Hessian information | |
CN110321997A (en) | High degree of parallelism computing platform, system and calculating implementation method | |
CN110119805B (en) | Convolutional neural network algorithm based on echo state network classification | |
CN113239949A (en) | Data reconstruction method based on 1D packet convolutional neural network | |
CN112949610A (en) | Improved Elman neural network prediction method based on noise reduction algorithm | |
CN112836729A (en) | Construction method of image classification model and image classification method | |
Fan et al. | HFPQ: deep neural network compression by hardware-friendly pruning-quantization | |
CN115587628A (en) | Deep convolutional neural network lightweight method | |
CN112836757A (en) | Deep learning network convolution kernel internal parameter sharing method | |
CN110728352A (en) | Large-scale image classification method based on deep convolutional neural network | |
CN117034030A (en) | Electroencephalo-gram data alignment algorithm based on positive and negative two-way information fusion | |
CN116757255A (en) | Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model | |
CN111639751A (en) | Non-zero padding training method for binary convolutional neural network | |
CN116109868A (en) | Image classification model construction and small sample image classification method based on lightweight neural network | |
CN116301914A (en) | Convolutional neural network deployment method based on GAP8 microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |