CN113033628A - Self-adaptive neural network compression method - Google Patents

Self-adaptive neural network compression method Download PDF

Info

Publication number
CN113033628A
CN113033628A CN202110255097.0A CN202110255097A CN113033628A CN 113033628 A CN113033628 A CN 113033628A CN 202110255097 A CN202110255097 A CN 202110255097A CN 113033628 A CN113033628 A CN 113033628A
Authority
CN
China
Prior art keywords
neural network
threshold
adaptive
value
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110255097.0A
Other languages
Chinese (zh)
Inventor
侯向辉
袁智龙
沈宁
袁晨
李泽昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110255097.0A priority Critical patent/CN113033628A/en
Publication of CN113033628A publication Critical patent/CN113033628A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a self-adaptive neural network compression method, wherein a first part is to preset an expected accuracy rate; the second part is to use an adaptive pruning algorithm to compress and remove most unnecessary connections in the neural network; and the third part is to use a self-adaptive weight sharing algorithm to realize weight sharing and further realize the reduction of parameters. The invention can reduce the parameter quantity of the neural network under the condition of keeping the accuracy of the original model, automatically and effectively compress the neural network and reduce the human intervention to the maximum extent.

Description

Self-adaptive neural network compression method
Technical Field
The invention relates to the technical field of computer neural networks, in particular to a self-adaptive neural network compression method.
Background
The research of deep learning neighborhoods has continued to heat up in recent years and has achieved tremendous success in computer vision, speech recognition, and natural language processing neighborhoods. The neural network obtains a strong learning ability at the cost of the expansion of the parameter number. LeNet completed the classification of handwritten digits with a parameter volume of 1M in 1998, and Alexnet reached 60M by 2012 when it took the champion on the ImageNet dataset. In the following years, the parameters of neural networks are increasing, and the amount of vgg16 parameters is even more than 500 MB. Although these parameters are very versatile, they consume a large amount of storage, memory and computational resources. This phenomenon of over parameterized (over parameterized) makes some neural networks deployable only on some specialized servers. Algorithms for compressing neural networks are available to effectively reduce the number of parameters in the neural network, but these methods require manual intervention to pre-configure some empirical parameters. Experiments have shown that neural networks are quite sensitive to these empirical parameters, which have a large impact on the effect of the final compression. How to automatically and effectively compress the neural network and reduce human intervention to the maximum extent is a practical problem with great research value. In the prior art, when the neural network is compressed, the compression effect is greatly influenced by the manually-intervened experience parameters.
Disclosure of Invention
The invention aims to overcome the defects and provides a self-adaptive neural network compression method, which can reduce the number of parameters of the neural network under the condition of keeping the accuracy of the original model, automatically and effectively compress the neural network and reduce human intervention to the greatest extent.
The invention achieves the aim through the following technical scheme: an adaptive neural network compression method, comprising the steps of:
(1) presetting a desired accuracy;
(2) training a neural network, and pruning the neural network by using a self-adaptive pruning algorithm;
(3) based on the pruned neural network, the weight sharing is carried out on the rest parameters in the neural network by using a self-adaptive weight sharing algorithm, so that the number of the parameters is further reduced, and the compression of the neural network is realized.
Preferably, the step (2) adopts a binary calculation threshold method in the pruning process, which is specifically as follows:
(2.1) training a common neural network by using a conventional method;
(2.2) calculating a suitable threshold value using dichotomy for each layer in the neural network such that the accuracy of the neural network is not significantly degraded upon removal of connections having an absolute value less than the threshold value;
(2.3) removing connections in the neural network whose absolute values are smaller than the threshold calculated in step (2);
(2.4) continuing to train the connection which is not removed in the neural network, and repeating the step (2.2) and the step (2.3) until no connection which is not removed exists in the neural network.
Preferably, the step (2.2) is specifically as follows:
(2.2.1) setting upper and lower bounds of upper two halves: the lower bound of the dichotomy is 0, and the upper bound of the dichotomy is the maximum value of the absolute value in the current layer;
(2.2.2) dividing a proper threshold value for a certain layer of the neural network according to a preset accuracy rate; the expected accuracy rate needs to be preset in the process of calculating the threshold, and then a threshold is calculated in the upper and lower boundaries of the dichotomy, so that the accuracy rate on the test set is not lower than the expected accuracy rate after all the connections smaller than the threshold in the original network are removed.
Preferably, the step (3) is specifically as follows:
(3.1) searching a proper threshold value for each layer in the neural network model according to the expected accuracy rate, so that the accuracy rate on the test set is not lower than the expected accuracy rate after sharing according to the weight value of the threshold value;
(3.2) realizing weight sharing by utilizing a clustering algorithm, wherein in the clustering algorithm, for each cluster, the mean value of the elements in the cluster is selected as the center of the cluster; it is specified that for a threshold, the distance of the elements within each cluster to the cluster center cannot exceed this threshold:
Figure BDA0002966838860000031
(3.3) finely adjusting the value of the clustering center of mass of each layer; in each round of training process, forward propagation is carried out firstly, and the gradient of the whole centroid is the gradient sum of all the same types of connection with the centroid in the backward propagation process.
Preferably, in the step (3), the neural network is further compressed by using weight sharing through reducing the number of bits consumed during storage, and by allowing a plurality of connections with similar weights to share the same weight, the network is stored by only storing which type of the connected weight belongs to, rather than a specific value thereof; if k classes are used for weight sharing, then it is necessary to do so
Figure BDA0002966838860000032
Bits are used for storing which type the weight value corresponds to; if it is originally necessary to store n b bits, the compression ratio can be calculated as follows:
Figure BDA0002966838860000041
preferably, the step (3.2) is specifically as follows:
(3.2.1) sorting the values in the original set;
(3.2.2) maintaining two head and tail pointers, and moving the tail pointer backwards each time until the difference of the elements pointed by the head and tail pointers is larger than a selected threshold value; if the difference of the elements pointed by the head and tail pointers is larger than the selected threshold value, classifying the elements between the head and tail pointers into one class; and the steps are executed in a loop until all the clustering is finished.
The invention has the beneficial effects that: the invention can reduce the parameter quantity of the neural network under the condition of keeping the accuracy of the original model, automatically and effectively compress the neural network and reduce the human intervention to the maximum extent.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the approximate monotonicity of the accuracy and threshold values of an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:
example (b): as shown in fig. 1, an adaptive neural network compression method consists of three parts: the first part is to preset a desired accuracy; the second part is to use an adaptive pruning algorithm to compress and remove most unnecessary connections in the neural network; and the third part is to use a self-adaptive weight sharing algorithm to realize weight sharing and further realize the reduction of parameters. Through these three parts, the algorithm compresses LeNet on the minst dataset, eliminating 94% of unnecessary connections at the cost of 0.3% accuracy. The method comprises the following steps:
step 1: presetting expected accuracy;
step 2: the neural network only learns important connections by using a self-adaptive pruning algorithm, and most unnecessary connections are removed; the method comprises the following specific steps:
step 2.1: a common neural network is trained using conventional methods. This step is not to learn the final weight of each connection in the network, but rather to learn the important connections in the network.
Step 2.2: a suitable threshold is calculated using bisection for each layer in the network so that the accuracy of the neural network does not significantly degrade after removing connections whose absolute values are less than this threshold. As shown in fig. 2, the method is a schematic diagram of a monotonic relationship between a threshold value selected by pruning and a network accuracy rate after pruning.
Step 2.2.1: setting upper and lower limits of the upper half: the lower bound of the dichotomy is 0, and the upper bound of the dichotomy is the most significant of the absolute values in the current layer.
Step 2.2.2: and according to the preset accuracy, dividing a proper threshold value for a certain layer of the neural network.
Step 2.3: the connections in the neural network whose absolute value is smaller than the threshold value calculated in the second step are removed. After this step, the original dense network will become a sparse network.
Step 2.4: the connections in the network that are not removed are trained and the second step is then repeated.
And step 3: and the remaining parameters are subjected to weight sharing by using a self-adaptive weight sharing algorithm, so that the number of the parameters is further reduced.
Step 3.1: and searching a proper threshold value for each layer in the model according to the expected accuracy rate, so that the accuracy rate on the test set is not lower than the expected accuracy rate after the sharing according to the threshold value weight.
Step 3.2: the clustering algorithm constructed by the invention is specified to be a threshold, and the distance from each element in each cluster to the cluster center cannot exceed the threshold.
Figure BDA0002966838860000061
Step 3.3: and finely adjusting the value of the clustering center of mass of each layer. Because the accuracy of the compressed neural network is guaranteed to be higher than the expected accuracy in the binary search process, the centroid does not need to be repeatedly adjusted, and even the centroid can be selected not to be adjusted.
Weight sharing may further compress the neural network by reducing the number of bits consumed in storage. By sharing the same weight value among a plurality of connections with similar weight values, the network can be stored only by storing which kind of connection weight value belongs to, but not by storing the specific value of the connection weight value. If k classes are used for weight sharing, then it is necessary to do so
Figure BDA0002966838860000063
Bits are used for storing which type the weight value corresponds to; if it is originally necessary to store n b bits, the compression ratio can be calculated as follows:
Figure BDA0002966838860000062
wherein the step 3.2 is as follows:
(3.2.1) sorting the values in the original set.
(3.2.2) maintaining two head and tail pointers, moving the tail pointer backwards each time until the difference between the elements pointed by the head and tail pointers is larger than the selected threshold. At this point, the elements between the head and tail pointers are classified into one class, and then the second step is repeated.
Obviously, the clustering algorithm constructed by the invention can ensure that the number of clusters is minimum. For each cluster, the mean of the elements within this cluster is chosen as the center of the cluster. Because the range within a cluster does not exceed the threshold, the distance of any element to the cluster center does not exceed the threshold.
In summary, the pruning algorithm is optimized, so that the neural network can be automatically pruned, and unnecessary connections in the neural network can be deleted while the accuracy of the neural network is not obviously reduced by dividing into two appropriate thresholds. The invention compresses LeNet on the minst data set, and eliminates 94% of unnecessary connection at the cost of 0.3% of accuracy loss.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. An adaptive neural network compression method, comprising the steps of:
(1) presetting a desired accuracy;
(2) training a neural network, and pruning the neural network by using a self-adaptive pruning algorithm;
(3) based on the pruned neural network, the weight sharing is carried out on the rest parameters in the neural network by using a self-adaptive weight sharing algorithm, so that the number of the parameters is further reduced, and the compression of the neural network is realized.
2. The adaptive neural network compression method of claim 1, wherein: in the step (2), a binary calculation threshold method is adopted in the pruning process, and the method specifically comprises the following steps:
(2.1) training a common neural network by using a conventional method;
(2.2) calculating a suitable threshold value using dichotomy for each layer in the neural network such that the accuracy of the neural network is not significantly degraded upon removal of connections having an absolute value less than the threshold value;
(2.3) removing connections in the neural network whose absolute values are smaller than the threshold calculated in step (2);
(2.4) continuing to train the connection which is not removed in the neural network, and repeating the step (2.2) and the step (2.3) until no connection which is not removed exists in the neural network.
3. The adaptive neural network compression method of claim 2, wherein: the step (2.2) is specifically as follows:
(2.2.1) setting upper and lower bounds of upper two halves: the lower bound of the dichotomy is 0, and the upper bound of the dichotomy is the maximum value of the absolute value in the current layer;
(2.2.2) dividing a proper threshold value for a certain layer of the neural network according to a preset accuracy rate; the expected accuracy rate needs to be preset in the process of calculating the threshold, and then a threshold is calculated in the upper and lower boundaries of the dichotomy, so that the accuracy rate on the test set is not lower than the expected accuracy rate after all the connections smaller than the threshold in the original network are removed.
4. The adaptive neural network compression method of claim 1, wherein: the step (3) is specifically as follows:
(3.1) searching a proper threshold value for each layer in the neural network model according to the expected accuracy rate, so that the accuracy rate on the test set is not lower than the expected accuracy rate after sharing according to the weight value of the threshold value;
(3.2) realizing weight sharing by utilizing a clustering algorithm, wherein in the clustering algorithm, for each cluster, the mean value of the elements in the cluster is selected as the center of the cluster; it is specified that for a threshold, the distance of the elements within each cluster to the cluster center cannot exceed this threshold:
Figure FDA0002966838850000021
(3.3) finely adjusting the value of the clustering center of mass of each layer; in each round of training process, forward propagation is carried out firstly, and the gradient of the whole centroid is the gradient sum of all the same types of connection with the centroid in the backward propagation process.
5. The adaptive neural network compression method of claim 4, wherein: in the step (3), the weight sharing is utilized to further compress the neural network by reducing the number of bits consumed in storage, and by enabling a plurality of connections with similar weights to share the same weight, the network is stored only by storing which type of the connected weight belongs to, but not the specific value thereof; if k classes are used for weight sharing, then it is necessary to do so
Figure FDA0002966838850000023
Bits are used for storing which type the weight value corresponds to; if it is originally necessary to store n b bits, the compression ratio can be calculated as follows:
Figure FDA0002966838850000022
6. the adaptive neural network compression method of claim 1, wherein: the step (3.2) is specifically as follows:
(3.2.1) sorting the values in the original set;
(3.2.2) maintaining two head and tail pointers, and moving the tail pointer backwards each time until the difference of the elements pointed by the head and tail pointers is larger than a selected threshold value; if the difference of the elements pointed by the head and tail pointers is larger than the selected threshold value, classifying the elements between the head and tail pointers into one class; and the steps are executed in a loop until all the clustering is finished.
CN202110255097.0A 2021-03-09 2021-03-09 Self-adaptive neural network compression method Pending CN113033628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110255097.0A CN113033628A (en) 2021-03-09 2021-03-09 Self-adaptive neural network compression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110255097.0A CN113033628A (en) 2021-03-09 2021-03-09 Self-adaptive neural network compression method

Publications (1)

Publication Number Publication Date
CN113033628A true CN113033628A (en) 2021-06-25

Family

ID=76467283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110255097.0A Pending CN113033628A (en) 2021-03-09 2021-03-09 Self-adaptive neural network compression method

Country Status (1)

Country Link
CN (1) CN113033628A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502698A (en) * 2023-06-29 2023-07-28 中国人民解放军国防科技大学 Network channel pruning rate self-adaptive adjustment method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502698A (en) * 2023-06-29 2023-07-28 中国人民解放军国防科技大学 Network channel pruning rate self-adaptive adjustment method, device, equipment and storage medium
CN116502698B (en) * 2023-06-29 2023-08-29 中国人民解放军国防科技大学 Network channel pruning rate self-adaptive adjustment method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106919942B (en) Accelerated compression method of deep convolution neural network for handwritten Chinese character recognition
CN108764471B (en) Neural network cross-layer pruning method based on feature redundancy analysis
CN109635935B (en) Model adaptive quantization method of deep convolutional neural network based on modular length clustering
CN111079781B (en) Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition
CN108121975B (en) Face recognition method combining original data and generated data
CN113593611B (en) Voice classification network training method and device, computing equipment and storage medium
CN108647723B (en) Image classification method based on deep learning network
WO2020238237A1 (en) Power exponent quantization-based neural network compression method
CN111105007B (en) Compression acceleration method of deep convolutional neural network for target detection
CN111079899A (en) Neural network model compression method, system, device and medium
CN108319988B (en) Acceleration method of deep neural network for handwritten Chinese character recognition
CN105631416A (en) Method for carrying out face recognition by using novel density clustering
CN112215353B (en) Channel pruning method based on variational structure optimization network
CN113657421B (en) Convolutional neural network compression method and device, and image classification method and device
CN113283473B (en) CNN feature mapping pruning-based rapid underwater target identification method
CN109918507B (en) textCNN (text-based network communication network) improved text classification method
Yue et al. Face recognition based on histogram equalization and convolution neural network
CN112990420A (en) Pruning method for convolutional neural network model
CN113033628A (en) Self-adaptive neural network compression method
US7705754B2 (en) Method and system for the compression of probability tables
CN107562853A (en) A kind of method that streaming towards magnanimity internet text notebook data is clustered and showed
CN113762505B (en) Method for clustering pruning according to L2 norms of channels of convolutional neural network
CN112115837A (en) Target detection method based on YoloV3 and dual-threshold model compression
Li et al. A spectral clustering based filter-level pruning method for convolutional neural networks
CN116245162A (en) Neural network pruning method and system based on improved adaptive genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination