CN110097177B - Network pruning method based on pseudo-twin network - Google Patents

Network pruning method based on pseudo-twin network Download PDF

Info

Publication number
CN110097177B
CN110097177B CN201910400920.5A CN201910400920A CN110097177B CN 110097177 B CN110097177 B CN 110097177B CN 201910400920 A CN201910400920 A CN 201910400920A CN 110097177 B CN110097177 B CN 110097177B
Authority
CN
China
Prior art keywords
network
pseudo
twin
target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910400920.5A
Other languages
Chinese (zh)
Other versions
CN110097177A (en
Inventor
闵锐
蓝海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Coreda Chengdu Technology Co ltd
Original Assignee
Electric Coreda Chengdu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Coreda Chengdu Technology Co ltd filed Critical Electric Coreda Chengdu Technology Co ltd
Priority to CN201910400920.5A priority Critical patent/CN110097177B/en
Publication of CN110097177A publication Critical patent/CN110097177A/en
Application granted granted Critical
Publication of CN110097177B publication Critical patent/CN110097177B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of neural network model compression, and particularly relates to a network pruning method based on a pseudo-twin network. The invention provides a universal method for cutting redundant connection in a large convolutional neural network on the basis of the traditional ternary neural network, which specifically comprises the following steps: constructing a Ternary network Ternay-N with the same structure as the cut network N, training each step by using the same picture, adopting a knowledge distillation algorithm in the training process, cutting the weight of the corresponding position in the N by using the weight in the Ternay-N after the training is finished, and finally finely adjusting the N, wherein the connection number of the finely adjusted network is greatly reduced compared with that of the original network.

Description

Network pruning method based on pseudo-twin network
Technical Field
The invention belongs to the technical field of neural network model compression, and relates to a network pruning method based on a pseudo-twin network.
Background
Under the background of rapid development of artificial intelligence, a neural network has become a key technology for realizing artificial intelligence. Among many neural networks, convolutional neural networks have been the focus of research because of their excellent performance in image classification and target detection. However, since the modern convolutional neural network occupies too much memory resources and computational resources, it affects its deployment in real-time hardware systems with limited resources. Therefore, how to compress the model size of the convolutional neural network and reduce its computational effort becomes the key to deploying artificial intelligence technology in a limited real-time hardware system.
Implementing compression of convolutional neural networks requires joint solutions from many disciplines, including but not limited to machine learning, optimization, computer architecture, data compression, indexing, and hardware design. Focusing mainly on model compression methods in terms of network nodes, reviewing the practical methods proposed in this regard by researchers in recent years, we can classify these methods into four categories: parameter pruning and sharing, low rank decomposition, transfer/compact convolution filters and knowledge distillation.
Unstructured pruning is an important method for compressing the model size and reducing the computational load of convolutional neural networks, and achieves the purpose of compressing and accelerating convolutional neural networks by clipping the weights and connections in the convolutional neural networks.
Parameter sharing is realized by sharing the same parameter set by the weight of the network model, so as to achieve the purpose of model compression. One important method is network quantization, which compresses the original network by reducing the number of bits required to represent each weight. There is work to apply k-means scalar quantization to parameter values. It has also been shown by scholars that 8-bit quantization of parameters can lead to significant speedup with minimal loss of precision. There are also efforts to use 16-bit fixed-point representation in CNN training based on random rounding, which significantly reduces memory usage and floating-point operations with little loss in classification accuracy. In the extreme case of a 1-bit representation of each weight, i.e., a binary-weighted neural network, there are also many jobs to directly train CNNs with binary weights, e.g., binaryConnect, binaryNet, and xnorretwork.
Network pruning is the objective of compression by removing non-critical connections or parameters in the network. The early methods of network pruning are based on the size of the weight values. Researchers have proposed optimal brain injury and optimal brain surgeon methods that reduce the number of connections using a clipping method based on the loss function Hessian, experiments have shown that such clipping provides higher accuracy than weight-based clipping (e.g., weight decay methods). The latest trend of network pruning is to prune redundant non-information weight in a pre-trained CNN model and to remove redundant neurons based on a no-data pruning method. However, the current pruning technique still has a big problem in the aspects of pruning rate and precision maintenance.
Disclosure of Invention
The invention aims to solve the problems and provides a network pruning method based on a pseudo-twin network.
The technical scheme of the invention is as follows:
1) Reading the network structure of the pruned network, and then constructing a network which is the same as the pruned 32bit network, wherein the network weights are all three values, and the activation function is 32bit, namely the weight parameter W of the network belongs to-1,0,1, and the three-value network is called a pseudo-twin network.
2) And updating the parameters of the pruned network by using a traditional random gradient descent method by taking the target data set as the input of the network, taking the cross entropy of the network output vector and the real vector of the target as a loss function. Then the twin network is trained by adopting a knowledge distillation method. In knowledge distillation, let the output of the pruned network be T (i) and the output of the twin network be S (i). The true distribution of the targets is L (i) cost function L of the pruned network t
Figure BDA0002059734250000021
Cost function L of twin network s
Figure BDA0002059734250000022
The figure is shown in figure 1;
3) The training algorithm mainly comprises the following three steps, which are shown in the figure 2:
1. performing simple preprocessing on the image, including normalization and resizing;
and (3) circulation:
2. inputting the training image into the pruned network, and outputting L by the pruned network t The weights of the teacher network are updated.
3. Putting the same training image in the step b into a twin network by L s The ternary weights in the twin network are updated.
When L is t And L s When the number of the branches is less than 0.01, the circulation is ended, and the weights of the pruned network and the twin network are stored;
4) The method comprises the specific steps of using a twin network as a template to cut the pruned network, multiplying the absolute value of a convolution kernel of the pruned network with the convolution kernel in the twin network to obtain a new convolution kernel, cutting a part with a weight value of 0 in the new convolution kernel, and replacing the convolution kernel in the pruned network with the cut new convolution kernel. The figure is shown in figure 3;
5) The cropped network is further trained to fine tune the parameters therein. The fine tuning method is divided into two types:
firstly, the method comprises the following steps: and (4) reserving the parameters left by the original network and finely adjusting the parameters of the three-value network. The diagram is shown in fig. 4.
2. And (4) reserving the parameters left by the three-value network and finely adjusting the parameters of the original network. The diagram is shown in figure 5.
The trimmed pruned network is the pruned network.
The invention has the advantages that the precision of the pruned network can not be obviously reduced, and the pruning speed is high.
Drawings
FIG. 1 is a calculation of a teacher and student network cost function;
FIG. 2 is a training process for a teacher and student network;
FIG. 3 is an example of pruning a teacher network using a student network as a template;
FIG. 4 is an example of a fine tuning approach;
fig. 5 shows another example of the trimming method.
Detailed Description
Take Le-Net network and MNIST data set as examples:
the MNIST data set is composed of handwritten digital images, a training set comprises 55000 samples, a testing set comprises 10000 samples, a verification set comprises 5000 samples, and each sample has corresponding label information, namely label. All digital images were size normalized and concentrated in a fixed size image of 28x 28 pixels. In the original data set, each pixel of the image is represented by a value between 0 and 255, where 0 is black and 255 is white, and anything in between is a different shade of gray.
Le-Net is a convolutional neural network proposed in 1986 for recognizing handwritten characters, and has 3 convolutional layers, two pooling layers, and a full-link layer and an output layer. The result of clipping it with our method is as follows:
tables 1 and 2 show the results of the first trimming mode, and tables 3 and 4 show the results of the second trimming mode:
TABLE 1
Network layer Amount of ginseng Pruning rate
Conv1 150 0.413
Conv2 2400 0.467
Conv3 48000 0.528
FC1 10080 0.401
FC2 840 0.407
Total 61470 0.503
TABLE 2
Name of model Error rate
LeNet 1.39%
Three-valued LeNet 1.97%
Cut LeNet 1.49%
The following is to keep the ternary network unchanged:
TABLE 3
Network layer Amount of ginseng Pruning rate
Conv1 150 0.405
Conv2 2400 0.201
Conv3 48000 0.413
FC1 10080 0.436
FC2 840 0.423
Total 61470 0.408
TABLE 4
Name of model Error rate
LeNet 1.39%
Three-valued LeNet 1.97%
Cut LeNet 1.37%
As can be seen from the table, le-Net is pruned about 40% of redundant connections and the error rate is also reduced. This demonstrates the feasibility of the process of the invention.
Taking an AlexNet network and a Cifar-10 dataset as examples:
the Cifar-10 dataset had 60000 color images, 32 × 32, divided into 10 classes of 6000 images each. The inner part is 50000 for training, and 5 training batches are formed, wherein 10000 graphs in each batch are formed; another 10000 was used for testing, constituting a batch individually. From the test lot data, 1000 sheets were randomly taken from each of 10 categories. The remainder are randomly arranged to form a training batch. Note that the number of images in each class is not necessarily the same in a training batch, and there are 5000 images in each class for the training batch as a whole.
One proposed network for image recognition by AlexNet network design 2012 has 5 full convolutional layers and three full link layers. The results of clipping it with our method are as follows:
tables 5 and 6 show the results of the first trimming method, and tables 7 and 8 show the results of the second trimming method:
TABLE 5
Figure BDA0002059734250000051
Figure BDA0002059734250000061
TABLE 6
Model Error Rate
AlexNet 0.232
Ternary AlexNet 0.253
Pruned AlexNet 0.181
TABLE 7
Layer Params Compression Rate
Conv1 4800 40.4%
Conv2 153600 39.4%
Conv3 110592 40.5%
Conv4 147456 41.2%
Conv5 147456 40.0%
Fc1 4718592 58.0%
Fc2 16777216 62.6%
Fc3 40960 51.2%
Total 22100672 62.0%
TABLE 8
Figure BDA0002059734250000062
Figure BDA0002059734250000071
As can be seen from the table, alexNet pruned about 62% of the redundant connections and the error rate dropped by 3.4%, which also demonstrates the feasibility of our approach.

Claims (3)

1. A network pruning method based on a pseudo-twin network is characterized by comprising the following steps:
s1, constructing a pseudo-twin network according to a pruned target network, wherein the network weight of the pseudo-twin network is three values;
s2, reading a training sample, training a target network by adopting a random gradient descent method, setting the output of the target network as T (i), and setting the cost function of the target network as follows:
Figure FDA0002059734240000011
where n is the number of classes, L (i) is the true distribution of the target;
updating the weight of the target network by using the output T (i) of the target network;
the same training sample is adopted to train the pseudo-twin network, the output of the pseudo-twin network is set as S (i), and the cost function of the pseudo-twin network is as follows:
Figure FDA0002059734240000012
by means of L s Updating the ternary weights in the pseudo-twin network;
training process at L t And L s Ending when convergence occurs, and storing the weights of the target network and the pseudo-twin network;
s3, taking the pseudo twin network as a template, and cutting the target network:
multiplying the absolute value of the convolution kernel of the target network with the convolution kernel in the pseudo-twin network to obtain a new convolution kernel, clipping the part with the weight value of 0 in the convolution kernel, and replacing the convolution kernel in the target network with the clipped new convolution kernel.
2. The pseudo-twin network-based network pruning method according to claim 1, further comprising:
and S4, reserving parameters left by the target network, and adjusting the parameters of the pseudo-twin network by using a random gradient descent method.
3. The pseudo-twin network-based network pruning method according to claim 1, further comprising:
and S4, reserving parameters left by the pseudo twin network, and adjusting the parameters of the target network by using a random gradient descent method.
CN201910400920.5A 2019-05-15 2019-05-15 Network pruning method based on pseudo-twin network Active CN110097177B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910400920.5A CN110097177B (en) 2019-05-15 2019-05-15 Network pruning method based on pseudo-twin network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910400920.5A CN110097177B (en) 2019-05-15 2019-05-15 Network pruning method based on pseudo-twin network

Publications (2)

Publication Number Publication Date
CN110097177A CN110097177A (en) 2019-08-06
CN110097177B true CN110097177B (en) 2022-11-29

Family

ID=67448052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910400920.5A Active CN110097177B (en) 2019-05-15 2019-05-15 Network pruning method based on pseudo-twin network

Country Status (1)

Country Link
CN (1) CN110097177B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210150313A1 (en) * 2019-11-15 2021-05-20 Samsung Electronics Co., Ltd. Electronic device and method for inference binary and ternary neural networks

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909667B (en) * 2019-11-20 2022-05-10 北京化工大学 Lightweight design method for multi-angle SAR target recognition network
CN111091144B (en) * 2019-11-27 2023-06-27 云南电网有限责任公司电力科学研究院 Image feature point matching method and device based on depth pseudo-twin network
CN111008693B (en) * 2019-11-29 2024-01-26 小米汽车科技有限公司 Network model construction method, system and medium based on data compression
CN111695699B (en) * 2020-06-12 2023-09-08 北京百度网讯科技有限公司 Method, apparatus, electronic device, and readable storage medium for model distillation
CN112348167B (en) * 2020-10-20 2022-10-11 华东交通大学 Knowledge distillation-based ore sorting method and computer-readable storage medium
CN113724261A (en) * 2021-08-11 2021-11-30 电子科技大学 Fast image composition method based on convolutional neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334934A (en) * 2017-06-07 2018-07-27 北京深鉴智能科技有限公司 Convolutional neural networks compression method based on beta pruning and distillation
CN109543559A (en) * 2018-10-31 2019-03-29 东南大学 Method for tracking target and system based on twin network and movement selection mechanism

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10740676B2 (en) * 2016-05-19 2020-08-11 Nec Corporation Passive pruning of filters in a convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334934A (en) * 2017-06-07 2018-07-27 北京深鉴智能科技有限公司 Convolutional neural networks compression method based on beta pruning and distillation
CN109543559A (en) * 2018-10-31 2019-03-29 东南大学 Method for tracking target and system based on twin network and movement selection mechanism

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210150313A1 (en) * 2019-11-15 2021-05-20 Samsung Electronics Co., Ltd. Electronic device and method for inference binary and ternary neural networks

Also Published As

Publication number Publication date
CN110097177A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN110097177B (en) Network pruning method based on pseudo-twin network
WO2021042828A1 (en) Neural network model compression method and apparatus, and storage medium and chip
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN111695513B (en) Facial expression recognition method based on depth residual error network
CN113159173A (en) Convolutional neural network model compression method combining pruning and knowledge distillation
Yue et al. Face recognition based on histogram equalization and convolution neural network
CN112418397B (en) Image classification method based on lightweight convolutional neural network
Singh et al. Acceleration of deep convolutional neural networks using adaptive filter pruning
CN115829027A (en) Comparative learning-based federated learning sparse training method and system
CN112597919A (en) Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board
Fan et al. HFPQ: deep neural network compression by hardware-friendly pruning-quantization
CN114742997A (en) Full convolution neural network density peak pruning method for image segmentation
CN114882278A (en) Tire pattern classification method and device based on attention mechanism and transfer learning
Doan Large-scale insect pest image classification
Qi et al. Learning low resource consumption cnn through pruning and quantization
CN110807497A (en) Handwritten data classification method and system based on deep dynamic network
CN117671271A (en) Model training method, image segmentation method, device, equipment and medium
CN112561054A (en) Neural network filter pruning method based on batch characteristic heat map
CN116957010A (en) Model reasoning method and device for convolutional neural network
CN115100509B (en) Image identification method and system based on multi-branch block-level attention enhancement network
CN116310335A (en) Method for segmenting pterygium focus area based on Vision Transformer
CN115620064A (en) Point cloud down-sampling classification method and system based on convolutional neural network
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN115100694A (en) Fingerprint quick retrieval method based on self-supervision neural network
CN114494284A (en) Scene analysis model and method based on explicit supervision area relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant