CN110097177B - Network pruning method based on pseudo-twin network - Google Patents
Network pruning method based on pseudo-twin network Download PDFInfo
- Publication number
- CN110097177B CN110097177B CN201910400920.5A CN201910400920A CN110097177B CN 110097177 B CN110097177 B CN 110097177B CN 201910400920 A CN201910400920 A CN 201910400920A CN 110097177 B CN110097177 B CN 110097177B
- Authority
- CN
- China
- Prior art keywords
- network
- pseudo
- twin
- target
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of neural network model compression, and particularly relates to a network pruning method based on a pseudo-twin network. The invention provides a universal method for cutting redundant connection in a large convolutional neural network on the basis of the traditional ternary neural network, which specifically comprises the following steps: constructing a Ternary network Ternay-N with the same structure as the cut network N, training each step by using the same picture, adopting a knowledge distillation algorithm in the training process, cutting the weight of the corresponding position in the N by using the weight in the Ternay-N after the training is finished, and finally finely adjusting the N, wherein the connection number of the finely adjusted network is greatly reduced compared with that of the original network.
Description
Technical Field
The invention belongs to the technical field of neural network model compression, and relates to a network pruning method based on a pseudo-twin network.
Background
Under the background of rapid development of artificial intelligence, a neural network has become a key technology for realizing artificial intelligence. Among many neural networks, convolutional neural networks have been the focus of research because of their excellent performance in image classification and target detection. However, since the modern convolutional neural network occupies too much memory resources and computational resources, it affects its deployment in real-time hardware systems with limited resources. Therefore, how to compress the model size of the convolutional neural network and reduce its computational effort becomes the key to deploying artificial intelligence technology in a limited real-time hardware system.
Implementing compression of convolutional neural networks requires joint solutions from many disciplines, including but not limited to machine learning, optimization, computer architecture, data compression, indexing, and hardware design. Focusing mainly on model compression methods in terms of network nodes, reviewing the practical methods proposed in this regard by researchers in recent years, we can classify these methods into four categories: parameter pruning and sharing, low rank decomposition, transfer/compact convolution filters and knowledge distillation.
Unstructured pruning is an important method for compressing the model size and reducing the computational load of convolutional neural networks, and achieves the purpose of compressing and accelerating convolutional neural networks by clipping the weights and connections in the convolutional neural networks.
Parameter sharing is realized by sharing the same parameter set by the weight of the network model, so as to achieve the purpose of model compression. One important method is network quantization, which compresses the original network by reducing the number of bits required to represent each weight. There is work to apply k-means scalar quantization to parameter values. It has also been shown by scholars that 8-bit quantization of parameters can lead to significant speedup with minimal loss of precision. There are also efforts to use 16-bit fixed-point representation in CNN training based on random rounding, which significantly reduces memory usage and floating-point operations with little loss in classification accuracy. In the extreme case of a 1-bit representation of each weight, i.e., a binary-weighted neural network, there are also many jobs to directly train CNNs with binary weights, e.g., binaryConnect, binaryNet, and xnorretwork.
Network pruning is the objective of compression by removing non-critical connections or parameters in the network. The early methods of network pruning are based on the size of the weight values. Researchers have proposed optimal brain injury and optimal brain surgeon methods that reduce the number of connections using a clipping method based on the loss function Hessian, experiments have shown that such clipping provides higher accuracy than weight-based clipping (e.g., weight decay methods). The latest trend of network pruning is to prune redundant non-information weight in a pre-trained CNN model and to remove redundant neurons based on a no-data pruning method. However, the current pruning technique still has a big problem in the aspects of pruning rate and precision maintenance.
Disclosure of Invention
The invention aims to solve the problems and provides a network pruning method based on a pseudo-twin network.
The technical scheme of the invention is as follows:
1) Reading the network structure of the pruned network, and then constructing a network which is the same as the pruned 32bit network, wherein the network weights are all three values, and the activation function is 32bit, namely the weight parameter W of the network belongs to-1,0,1, and the three-value network is called a pseudo-twin network.
2) And updating the parameters of the pruned network by using a traditional random gradient descent method by taking the target data set as the input of the network, taking the cross entropy of the network output vector and the real vector of the target as a loss function. Then the twin network is trained by adopting a knowledge distillation method. In knowledge distillation, let the output of the pruned network be T (i) and the output of the twin network be S (i). The true distribution of the targets is L (i) cost function L of the pruned network t :
Cost function L of twin network s :
The figure is shown in figure 1;
3) The training algorithm mainly comprises the following three steps, which are shown in the figure 2:
1. performing simple preprocessing on the image, including normalization and resizing;
and (3) circulation:
2. inputting the training image into the pruned network, and outputting L by the pruned network t The weights of the teacher network are updated.
3. Putting the same training image in the step b into a twin network by L s The ternary weights in the twin network are updated.
When L is t And L s When the number of the branches is less than 0.01, the circulation is ended, and the weights of the pruned network and the twin network are stored;
4) The method comprises the specific steps of using a twin network as a template to cut the pruned network, multiplying the absolute value of a convolution kernel of the pruned network with the convolution kernel in the twin network to obtain a new convolution kernel, cutting a part with a weight value of 0 in the new convolution kernel, and replacing the convolution kernel in the pruned network with the cut new convolution kernel. The figure is shown in figure 3;
5) The cropped network is further trained to fine tune the parameters therein. The fine tuning method is divided into two types:
firstly, the method comprises the following steps: and (4) reserving the parameters left by the original network and finely adjusting the parameters of the three-value network. The diagram is shown in fig. 4.
2. And (4) reserving the parameters left by the three-value network and finely adjusting the parameters of the original network. The diagram is shown in figure 5.
The trimmed pruned network is the pruned network.
The invention has the advantages that the precision of the pruned network can not be obviously reduced, and the pruning speed is high.
Drawings
FIG. 1 is a calculation of a teacher and student network cost function;
FIG. 2 is a training process for a teacher and student network;
FIG. 3 is an example of pruning a teacher network using a student network as a template;
FIG. 4 is an example of a fine tuning approach;
fig. 5 shows another example of the trimming method.
Detailed Description
Take Le-Net network and MNIST data set as examples:
the MNIST data set is composed of handwritten digital images, a training set comprises 55000 samples, a testing set comprises 10000 samples, a verification set comprises 5000 samples, and each sample has corresponding label information, namely label. All digital images were size normalized and concentrated in a fixed size image of 28x 28 pixels. In the original data set, each pixel of the image is represented by a value between 0 and 255, where 0 is black and 255 is white, and anything in between is a different shade of gray.
Le-Net is a convolutional neural network proposed in 1986 for recognizing handwritten characters, and has 3 convolutional layers, two pooling layers, and a full-link layer and an output layer. The result of clipping it with our method is as follows:
tables 1 and 2 show the results of the first trimming mode, and tables 3 and 4 show the results of the second trimming mode:
TABLE 1
Network layer | Amount of ginseng | Pruning rate |
Conv1 | 150 | 0.413 |
Conv2 | 2400 | 0.467 |
Conv3 | 48000 | 0.528 |
FC1 | 10080 | 0.401 |
FC2 | 840 | 0.407 |
Total | 61470 | 0.503 |
TABLE 2
Name of model | Error rate |
LeNet | 1.39% |
Three-valued LeNet | 1.97% |
Cut LeNet | 1.49% |
The following is to keep the ternary network unchanged:
TABLE 3
Network layer | Amount of ginseng | Pruning rate |
Conv1 | 150 | 0.405 |
Conv2 | 2400 | 0.201 |
Conv3 | 48000 | 0.413 |
FC1 | 10080 | 0.436 |
FC2 | 840 | 0.423 |
Total | 61470 | 0.408 |
TABLE 4
Name of model | Error rate |
LeNet | 1.39% |
Three-valued LeNet | 1.97% |
Cut LeNet | 1.37% |
As can be seen from the table, le-Net is pruned about 40% of redundant connections and the error rate is also reduced. This demonstrates the feasibility of the process of the invention.
Taking an AlexNet network and a Cifar-10 dataset as examples:
the Cifar-10 dataset had 60000 color images, 32 × 32, divided into 10 classes of 6000 images each. The inner part is 50000 for training, and 5 training batches are formed, wherein 10000 graphs in each batch are formed; another 10000 was used for testing, constituting a batch individually. From the test lot data, 1000 sheets were randomly taken from each of 10 categories. The remainder are randomly arranged to form a training batch. Note that the number of images in each class is not necessarily the same in a training batch, and there are 5000 images in each class for the training batch as a whole.
One proposed network for image recognition by AlexNet network design 2012 has 5 full convolutional layers and three full link layers. The results of clipping it with our method are as follows:
tables 5 and 6 show the results of the first trimming method, and tables 7 and 8 show the results of the second trimming method:
TABLE 5
TABLE 6
Model | Error Rate |
AlexNet | 0.232 |
Ternary AlexNet | 0.253 |
Pruned AlexNet | 0.181 |
TABLE 7
Layer | Params | Compression Rate |
Conv1 | 4800 | 40.4% |
Conv2 | 153600 | 39.4% |
Conv3 | 110592 | 40.5% |
Conv4 | 147456 | 41.2% |
Conv5 | 147456 | 40.0% |
Fc1 | 4718592 | 58.0% |
Fc2 | 16777216 | 62.6% |
Fc3 | 40960 | 51.2% |
Total | 22100672 | 62.0% |
TABLE 8
As can be seen from the table, alexNet pruned about 62% of the redundant connections and the error rate dropped by 3.4%, which also demonstrates the feasibility of our approach.
Claims (3)
1. A network pruning method based on a pseudo-twin network is characterized by comprising the following steps:
s1, constructing a pseudo-twin network according to a pruned target network, wherein the network weight of the pseudo-twin network is three values;
s2, reading a training sample, training a target network by adopting a random gradient descent method, setting the output of the target network as T (i), and setting the cost function of the target network as follows:
where n is the number of classes, L (i) is the true distribution of the target;
updating the weight of the target network by using the output T (i) of the target network;
the same training sample is adopted to train the pseudo-twin network, the output of the pseudo-twin network is set as S (i), and the cost function of the pseudo-twin network is as follows:
by means of L s Updating the ternary weights in the pseudo-twin network;
training process at L t And L s Ending when convergence occurs, and storing the weights of the target network and the pseudo-twin network;
s3, taking the pseudo twin network as a template, and cutting the target network:
multiplying the absolute value of the convolution kernel of the target network with the convolution kernel in the pseudo-twin network to obtain a new convolution kernel, clipping the part with the weight value of 0 in the convolution kernel, and replacing the convolution kernel in the target network with the clipped new convolution kernel.
2. The pseudo-twin network-based network pruning method according to claim 1, further comprising:
and S4, reserving parameters left by the target network, and adjusting the parameters of the pseudo-twin network by using a random gradient descent method.
3. The pseudo-twin network-based network pruning method according to claim 1, further comprising:
and S4, reserving parameters left by the pseudo twin network, and adjusting the parameters of the target network by using a random gradient descent method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910400920.5A CN110097177B (en) | 2019-05-15 | 2019-05-15 | Network pruning method based on pseudo-twin network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910400920.5A CN110097177B (en) | 2019-05-15 | 2019-05-15 | Network pruning method based on pseudo-twin network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110097177A CN110097177A (en) | 2019-08-06 |
CN110097177B true CN110097177B (en) | 2022-11-29 |
Family
ID=67448052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910400920.5A Active CN110097177B (en) | 2019-05-15 | 2019-05-15 | Network pruning method based on pseudo-twin network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110097177B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210150313A1 (en) * | 2019-11-15 | 2021-05-20 | Samsung Electronics Co., Ltd. | Electronic device and method for inference binary and ternary neural networks |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909667B (en) * | 2019-11-20 | 2022-05-10 | 北京化工大学 | Lightweight design method for multi-angle SAR target recognition network |
CN111091144B (en) * | 2019-11-27 | 2023-06-27 | 云南电网有限责任公司电力科学研究院 | Image feature point matching method and device based on depth pseudo-twin network |
CN111008693B (en) * | 2019-11-29 | 2024-01-26 | 小米汽车科技有限公司 | Network model construction method, system and medium based on data compression |
CN111695699B (en) * | 2020-06-12 | 2023-09-08 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and readable storage medium for model distillation |
CN112348167B (en) * | 2020-10-20 | 2022-10-11 | 华东交通大学 | Knowledge distillation-based ore sorting method and computer-readable storage medium |
CN113724261A (en) * | 2021-08-11 | 2021-11-30 | 电子科技大学 | Fast image composition method based on convolutional neural network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334934A (en) * | 2017-06-07 | 2018-07-27 | 北京深鉴智能科技有限公司 | Convolutional neural networks compression method based on beta pruning and distillation |
CN109543559A (en) * | 2018-10-31 | 2019-03-29 | 东南大学 | Method for tracking target and system based on twin network and movement selection mechanism |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10740676B2 (en) * | 2016-05-19 | 2020-08-11 | Nec Corporation | Passive pruning of filters in a convolutional neural network |
-
2019
- 2019-05-15 CN CN201910400920.5A patent/CN110097177B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334934A (en) * | 2017-06-07 | 2018-07-27 | 北京深鉴智能科技有限公司 | Convolutional neural networks compression method based on beta pruning and distillation |
CN109543559A (en) * | 2018-10-31 | 2019-03-29 | 东南大学 | Method for tracking target and system based on twin network and movement selection mechanism |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210150313A1 (en) * | 2019-11-15 | 2021-05-20 | Samsung Electronics Co., Ltd. | Electronic device and method for inference binary and ternary neural networks |
Also Published As
Publication number | Publication date |
---|---|
CN110097177A (en) | 2019-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097177B (en) | Network pruning method based on pseudo-twin network | |
WO2021042828A1 (en) | Neural network model compression method and apparatus, and storage medium and chip | |
CN111696101A (en) | Light-weight solanaceae disease identification method based on SE-Inception | |
CN111695513B (en) | Facial expression recognition method based on depth residual error network | |
CN113159173A (en) | Convolutional neural network model compression method combining pruning and knowledge distillation | |
Yue et al. | Face recognition based on histogram equalization and convolution neural network | |
CN112418397B (en) | Image classification method based on lightweight convolutional neural network | |
Singh et al. | Acceleration of deep convolutional neural networks using adaptive filter pruning | |
CN115829027A (en) | Comparative learning-based federated learning sparse training method and system | |
CN112597919A (en) | Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board | |
Fan et al. | HFPQ: deep neural network compression by hardware-friendly pruning-quantization | |
CN114742997A (en) | Full convolution neural network density peak pruning method for image segmentation | |
CN114882278A (en) | Tire pattern classification method and device based on attention mechanism and transfer learning | |
Doan | Large-scale insect pest image classification | |
Qi et al. | Learning low resource consumption cnn through pruning and quantization | |
CN110807497A (en) | Handwritten data classification method and system based on deep dynamic network | |
CN117671271A (en) | Model training method, image segmentation method, device, equipment and medium | |
CN112561054A (en) | Neural network filter pruning method based on batch characteristic heat map | |
CN116957010A (en) | Model reasoning method and device for convolutional neural network | |
CN115100509B (en) | Image identification method and system based on multi-branch block-level attention enhancement network | |
CN116310335A (en) | Method for segmenting pterygium focus area based on Vision Transformer | |
CN115620064A (en) | Point cloud down-sampling classification method and system based on convolutional neural network | |
CN115063374A (en) | Model training method, face image quality scoring method, electronic device and storage medium | |
CN115100694A (en) | Fingerprint quick retrieval method based on self-supervision neural network | |
CN114494284A (en) | Scene analysis model and method based on explicit supervision area relation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |