CN108304928A

CN108304928A - Compression method based on the deep neural network for improving cluster

Info

Publication number: CN108304928A
Application number: CN201810075486.3A
Authority: CN
Inventors: 刘涵; 马琰
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2018-01-26
Filing date: 2018-01-26
Publication date: 2018-07-20

Abstract

The invention discloses the compression methods based on the deep neural network for improving cluster；The network after normal training is become by sparse network by Pruning strategy first, realize preliminary compression, then it clusters to obtain the cluster centre of every layer of weight by K Means++, indicate that original weighted value realizes that weights are shared with cluster centre value, the quantization of each layer weight is carried out finally by each strata class, it carries out retraining and updates cluster centre, realize final compression.The present invention is shared by beta pruning, weights and weights quantify three steps, and finally by 30 to 40 times of deep neural network reduced overall, and precision is promoted.Simple and effective based on the compression method for improving cluster, deep neural network is realized under conditions of not losing precision (or even promotion) and is effectively compressed, this makes depth network be deployed in order to possible in mobile terminal.

Description

Compression method based on the deep neural network for improving cluster

Technical field

The present invention relates to machine learning techniques fields, more particularly to the compression side based on the deep neural network for improving cluster Method.

Background technology

In the tasks such as a series of speech recognition and computer vision, deep neural network is all shown obviously Advantage.In addition to powerful computing platform and diversified trained frame, the powerful performance of deep neural network is mainly attributed to It largely can learning parameter.With the increase of network depth, the learning ability of network can also become strong.But this learning ability Enhancing be using the increase of memory and other computing resources as cost, a large amount of weight can consume sizable memory and storage Device bandwidth.The mobile terminals such as present mobile phone, vehicle-mounted are more and more using the demand to deep neural network, with regard to current depth model For size, most models can not be transplanted in mobile phone terminal APP or embedded chip at all.

Deep neural network was typically parametrization, and deep learning model, there are severely redundant, which results in meters The waste calculated and stored.Many methods have been proposed at present to be compressed to deep learning model.Main technology is related to Network beta pruning, quantization and low-rank decomposition and transfer learning etc., and the object compressed is to be directed to depth convolutional neural networks.But It is to be substantially to be compressed mainly for full articulamentum, compression ratio is not high and precision has certain loss, these problems are all It is urgently to be resolved hurrily.

Based on the compression method for the deep neural network for improving cluster, by beta pruning, weights are shared and weights quantify three steps, Deep neural network is realized under conditions of not losing precision (or even promotion) and is effectively compressed.Compression method is simple and effective, this So that depth network is deployed in mobile terminal in order to possible.Therefore real to the compression based on the deep neural network for improving cluster It now studies, the practical application and further theoretical research for deep neural network all have significance.

Invention content

The purpose of the present invention is to provide the compression methods based on the deep neural network for improving cluster, are not losing precision Effective compression is realized to deep neural network under conditions of (or even promotion) so that deployment of the deep neural network in mobile terminal It is possibly realized.

The present invention uses following technical scheme to achieve the above object：

Based on the compression method for the deep neural network for improving cluster, include the following steps：

1), Pruning strategy；

Beta pruning process is broadly divided into three steps, carries out conventional training to network first, and preserve the model after training；Then right The smaller connection of weights carries out beta pruning, and primitive network becomes sparse network, preserves the sparse network model after beta pruning；Finally to dilute Network retraining is dredged to ensure the validity of CNN, final model is preserved after retraining；The process of beta pruning retraining each time All it is an iteration, with the increase of repetitive exercise number, accuracy can gradually increase, and after successive ignition, find best Connection；

After beta pruning is completed, primitive network just becomes sparse network, in conjunction with actual conditions, is to sparse network structure finally The CSC formats of spicy are selected to store；

2) weights, based on K-Means++ algorithms are shared；

K-Means++ algorithms are selected to be clustered, by original n weights W={ w₁,w₂,......w_nIt is divided into k Class C={ c₁,c₂,...,c_k, wherein n ＞＞ k, | | w_i-w_j| | indicate w_iAnd w_jBetween Euclidean distance, define W about C Cost function it is as follows：

The target of K-Means is exactly to select C to minimize cost function φ_W(C), K-Means++ and its optimization aim phase Together, it is improved in the selection of initial cluster center, the basic thought of K-Means++ selection initial cluster centers is：Just Mutual distance between the cluster centre of beginning is remote as far as possible；

3), weights quantify；

The quantization of each layer weight is carried out by each strata class, finally carry out retraining update cluster centre, to weights into Row quantization reduces the digit indicated used in weights, and weights quantization realizes further compression to deep neural network；

For each weights, the call number of the cluster centre belonging to it is stored, when being trained to network, when propagated forward It needs to replace each weights with its corresponding cluster centre, when backpropagation calculates the weights gradient in each class, then will Its gradient and anti-pass, for updating cluster centre；

After the shared quantization of weights, all cluster centres are all stored in code book, and weights are not by 32 original floating-points Number indicates, but is indicated by the call number of its corresponding cluster centre, this step allows the data volume of storage to greatly reduce, finally deposits The result of storage is exactly a code book and concordance list, it is assumed that is polymerized to k classes, then needs log₂(k) code index is carried out in position, for n The network of a connection, each connection indicate there are k shared weights with b, then compression ratio r can be indicated as follows：

As a further solution of the present invention, in K-Means++ algorithms, the mutual distance between initial cluster centre Remote as far as possible, algorithm steps are as follows：

Step 1：From the W={ w of input₁,w₂,......w_nOne conduct of middle random selection, first cluster centre c₁；

Step 2：For each w in data set, calculates it and (refer in the cluster selected with nearest cluster centre The heart) distance D (w) and be stored in an array, then these distances are added up to obtain Sum (D (w))；

Step 3：Select a new c_iAs cluster centre, with probabilitySelect c_i=w' ∈ W, i.e. D (w') Larger point is selected larger as the probability of cluster centre；

Step 4：Step 2 and step 3 is repeated until k cluster centre is selected to come；

Step 5：Using this k initial cluster centres come the K-Means algorithms of operation standard.

As a further solution of the present invention, in K-Means++ algorithms, the algorithm realization of step 3 is as follows：First take one The random value Random in Sum (D (w)) can be fallen, then use Random-=D (w), until its value be less than or equal to 0, at this time Point is exactly next cluster centre, value Random=Sum (D (w)) * λ in experiment, λ ∈ (0,1)；As algorithm value Random When=Sum (D (w)) * λ, which can be fallen into the larger sections D (w) with larger probability, so corresponding point w₃It can be with larger Probability it is selected as new cluster centre；

After being clustered to every layer of weight by K-Means++ algorithms, original weighted value is indicated by cluster centre value Realize that weights are shared, same weights are shared in multiple connections of same layer, and the weights of cross-layer are not shared, and reduce the number of weights in this way Amount realizes the compression of deep neural network again.

Compared with prior art, the present invention has the following advantages：The deep neural network based on improvement cluster of the present invention Compression implementation method, by beta pruning, weights are shared and weights quantify three steps, finally by deep neural network reduced overall 30 To 40 times, and precision is promoted.Simple and effective based on the compression method for improving cluster, deep neural network is not losing essence It realizes and is effectively compressed under conditions of degree (or even promotion), this makes depth network be deployed in order to possible in mobile terminal.

Description of the drawings

Fig. 1 is the present invention based on the deep neural network compression method entire block diagram for improving cluster.

Fig. 2 is the sparse network schematic diagram after the beta pruning of the present invention.

Fig. 3 is the K-Means++ selection initial cluster center schematic diagrames of the present invention.

Fig. 4 is that the weights of the present invention share quantizing process schematic diagram.

Fig. 5 is loss of significance variation diagram after LeNet-300-100 network beta prunings.

Fig. 6 is top-1 Error Graph of the deep neural network under different compression schemes.

Specific implementation mode

The present invention is further elaborated in the following with reference to the drawings and specific embodiments.

The present invention proposes a kind of deep neural network compression method based on improvement cluster, first will by Pruning strategy Network after normal training becomes sparse network, realizes preliminary compression.Then it clusters to obtain every layer of weight by K-Means++ Cluster centre, indicate that original weighted value realizes that weights are shared with cluster centre value.It is carried out finally by each strata class each The quantization of layer weight carries out retraining and updates cluster centre, realizes final compression.Algorithm entirety fundamental block diagram is as shown in Figure 1. The following three stage is entirely divided into based on the deep neural network compression realization process for improving cluster：

1, Pruning strategy；

After traditional convolutional neural networks (CNN) have been trained, model is very huge, and the weight matrix of full articulamentum has several 100000, millions of a parameter values；But absolute value all very littles of many parameters, training or test result effect to CNN are very It is small, therefore we can attempt to remove these small value parameters by Pruning strategy, it can not only reduce the scale of model but also can To reduce calculation amount.Beta pruning process is broadly divided into three steps, in process such as Fig. 1 shown in Step1.

Conventional training is carried out to network first, and preserves the model after training.Then the connection smaller to weights is cut Branch, primitive network become sparse network, preserve the sparse network model after beta pruning.The accurate of network can be influenced after beta pruning connection Property, it is therefore desirable to the validity of CNN is ensured to this process of sparse network retraining, final mould is preserved after retraining Type.The process of beta pruning retraining each time is all an iteration, and with the increase of repetitive exercise number, accuracy can gradually increase Add, after successive ignition, best connection can be found.

After beta pruning is completed, primitive network just becomes sparse network (as shown in Figure 2)；We need efficient sparse matrix Storage format.Typical sparse matrix storage format has Coordinate (COO), CompressedSparseRow/Column (CSR/CSC) etc..In conjunction with actual conditions, we are finally that the CSC formats of spicy is selected to store to sparse network structure.

2, the weights based on K-Means++ algorithms are shared；

The present invention obtains the cluster centre of every layer of weight by clustering, and the present invention selects K-Means++ algorithms to carry out Cluster, this is because K-Means methods random selection initial cluster center is likely to result in the result of cluster and the reality of data Differing distribution is very big, unsatisfactory the result that linear initialization cluster centre obtains, and K-Means++ algorithms can be used It solves the problems, such as this, can effectively select initial point.And the proof provided according to document, K-Means++ is in speed Be all an advantage over K-Means's in precision.We are by original n weights W={ w₁,w₂,......w_nIt is divided into k class C= {c₁,c₂,...,c_k, wherein n ＞＞ k.||w_i-w_j| | indicate w_iAnd w_jBetween Euclidean distance；Define generations of the W about C Valence function is as follows：

The target of K-Means is exactly to select C to minimize cost function φ_W(C), K-Means++ and its optimization aim phase Together, it but is improved in the selection of initial cluster center.K-Means++ selects the basic thought of initial cluster center It is：Mutual distance between initial cluster centre is remote as far as possible, and algorithm steps are as follows：

Step 3：Select a new c_iAs cluster centre, with probabilitySelect c_i=w' ∈ W (i.e. D (w') Larger point is selected larger as the probability of cluster centre)；

In K-Means++ algorithms, the algorithm realization of step 3 is as follows：First take one can fall it is random in Sum (D (w)) Then value Random uses Random-=D (w), until its value is less than or equal to 0, point at this time is exactly next cluster centre.It is real Middle value Random=Sum (D (w)) * λ, λ ∈ (0,1) are tested, are understood for convenience, the present invention is indicated with Fig. 3.

As algorithm value Random=Sum (D (w)) * λ, which can fall into the larger sections D (w) with larger probability It is interior that (value falls into D (w in this figure₃) probability it is big), so corresponding point w₃It can be selected as new cluster using larger probability Center.

After being clustered to every layer of weight by K-Means++ algorithms, original weighted value is indicated by cluster centre value Realize that weights are shared, same weights (weights of cross-layer are not shared) are shared in multiple connections of same layer, can reduce weights in this way Quantity, again realize deep neural network compression.

3, weights quantify；

The present invention carries out the quantization of each layer weight by each strata class, finally carries out retraining and updates cluster centre.It is right Weights, which carry out quantization, can reduce the digit indicated used in weights, and weights quantization realizes further pressure to deep neural network Contracting.

For each weights, we need to only store the call number of the cluster centre belonging to it.It is preceding when being trained to network To needing to replace each weights with its corresponding cluster centre when propagating, it is terraced that when backpropagation, calculates the weights in each class Degree, then by its gradient and anti-pass, for updating cluster centre, (the identical weights of color indicate poly- to detailed process as shown in Figure 4 For one kind).

After the shared quantization of weights, all cluster centres are all stored in code book.Weights are not by 32 original floating-points Number indicates, but is indicated by the call number of its corresponding cluster centre.This step allows the data volume of storage to greatly reduce, and finally deposits The result of storage is exactly a code book and concordance list.Assuming that being polymerized to k classes, then log is needed₂(k) code index is carried out in position.For with n The network of a connection, each connection indicate there are k shared weights with b, then compression ratio r can be indicated as follows：

The present invention under a linux operating system tests deep neural network based on Caffe frames, in CUDA8.0 Concurrent operation is carried out in architecture, used CUBLAS function libraries realize BLAS, for sparse matrix using cuSPARSE into Row sparse calculation.After beta pruning and the shared quantization of weights, by real-time performance under conditions of not losing (or even promotion) precision Considerable compression.Our experiment effects on each network introduced below.

1, LeNet networks；

Present invention LeNet-300-100 networks and LeNet-5 networks are tested on MNIST data sets.It is right first Network after normal training carries out beta pruning, and the trimming rate for determining every layer of depth network is tested by trial and error.After beta pruning is completed, I Retraining is carried out to LeNet-300-100, Fig. 5 indicates the precision and loss variation diagram to network after network retraining 50 times. As can be seen that in the case where not losing precision, beta pruning realizes preferable as a result, and having good Generalization Capability. After this, we carry out, and the weights based on K-Means++ are shared and weights quantizing process, and the weights of network are all quantified as 6- Bit, table 1 give the compression parameters and compression effectiveness (Weights% (P) expression warps of each process of LeNet-300-100 networks The proportion accounted for per layer parameter after beta pruning is crossed, table below is also such).

Table 1

As can be seen that beta pruning process has finally obtained 34 times by 13 times of model compression, with shared quantization in conjunction with network Compression ratio, precision have reached 98.5%.

The present invention is tested to LeNet-5 same methods, and compressed parameter is as shown in table 2,

Table 2

As can be seen that beta pruning process, by 12 times of model compression, final network has obtained 36 times of compression ratio, and precision reaches 99.3%, and realize preferable compression effectiveness.

2, AlexNet networks

The present invention is tested with AlexNet networks on ImageNet ILSVRC-2012 data sets.Same LeNet- 300-100 network experiment processes are the same, and details are not described herein.Network specifically compresses situation and is shown in Table 3；

Table 3

It can be seen that beta pruning process, by 10 times of Web compression, final network has obtained 30 times of compression ratio, Top-1 precision Reach 57.3%, Top-5 precision and reached 80.4%, precision is also promoted while compression network.

3, VGG-16 networks

The present invention is tested with VGG-16 networks on ImageNet ILSVRC-2012 data sets.With same side Method all carries out beta pruning and further compression to convolutional layer and full articulamentum.As shown in table 4；

Table 4

It can be seen that beta pruning process, by 15 times of Web compression, network has been finally reached 40 times of compression ratio, Top-1 precision 68.9%, Top-5 precision is reached and has reached 89.2%, by the effective compression of real-time performance.

4, brief summary；

Design parameter and performance before and after each Web compression is as shown in table 5.

Table 5

As can be seen from Table 5, by the present invention based on the deep neural network compression method for improving cluster, depth nerve 30 to 40 times of network reduced overall, realizes considerable compression.Although needing to be improved in terms of compression ratio, after compression Model it is sufficiently small, this makes deployment of the deep neural network on mobile terminal become possible to.It is pointed out that a side Face, with the method for the present invention by after Web compression, there is no losses for precision, but have a degree of promotion (such as Fig. 6 institutes Show), this has benefited from the improvement of clustering method in the present invention.On the other hand, convolutional layer and full articulamentum are all quantified as by the present invention 6bit differs the redundancy issue brought this avoids code length, eliminates this stage of huffman coding, therefore the present invention Compression method is simpler effectively.

In conclusion the compression method based on the deep neural network for improving cluster of the present invention is simple and effective, solve Compression ratio is not high in conventional compression method, has the problems such as loss of significance, and deep neural network (or even is carried not losing precision Rise) under conditions of realize and be effectively compressed so that deep neural network is deployed to mobile terminal in order to may.

The above is present pre-ferred embodiments, for the ordinary skill in the art, according to the present invention Introduction, in the case where not departing from the principle of the present invention with spirit, changes, modifications, replacement and change that embodiment is carried out Type is still fallen within protection scope of the present invention.

Claims

1. the compression method based on the deep neural network for improving cluster, which is characterized in that include the following steps：

1), Pruning strategy；

Beta pruning process is broadly divided into three steps, carries out conventional training to network first, and preserve the model after training；Then to weights Smaller connection carries out beta pruning, and primitive network becomes sparse network, preserves the sparse network model after beta pruning；Finally to sparse net Network retraining ensures the validity of CNN, after retraining preserves final model；The process of beta pruning retraining each time is all An iteration, with the increase of repetitive exercise number, accuracy can gradually increase, and after successive ignition, find best connection；

After beta pruning is completed, primitive network just becomes sparse network, is finally to select to sparse network structure in conjunction with actual conditions The CSC formats of spicy store；

2) weights, based on K-Means++ algorithms are shared；

K-Means++ algorithms are selected to be clustered, by original n weights W={ w₁,w₂,......w_nIt is divided into k class C= {c₁,c₂,...,c_k, wherein n ＞＞ k, | | w_i-w_j| | indicate w_iAnd w_jBetween Euclidean distance, define generations of the W about C Valence function is as follows：

The target of K-Means is exactly to select C to minimize cost function φ_W(C), K-Means++ is identical as its optimization aim, It is improved in the selection of initial cluster center, the basic thought of K-Means++ selection initial cluster centers is：Initial is poly- Mutual distance between class center is remote as far as possible；

3), weights quantify；

The quantization of each layer weight is carried out by each strata class, is finally carried out retraining and is updated cluster centre, to the weights amount of progress Change the digit for reducing and indicating used in weights, weights quantization realizes further compression to deep neural network；

For each weights, the call number of the cluster centre belonging to it is stored, when being trained to network, when propagated forward needs Each weights are replaced with its corresponding cluster centre, when backpropagation calculates the weights gradient in each class, then by its ladder Degree and anti-pass, for updating cluster centre；After the shared quantization of weights, all cluster centres are all stored in code book, and weights are not It is to be indicated by original 32 floating numbers, but indicated by the call number of its corresponding cluster centre, this step allows the number of storage Greatly reduce according to amount, the result finally stored is exactly a code book and concordance list, it is assumed that is polymerized to k classes, then needs log₂(k) position is come Code index, for the network connected with n, each connection is indicated with b, there is a shared weights of k, then compression ratio r can be with It indicates as follows：

2. the compression method according to claim 1 based on the deep neural network for improving cluster, which is characterized in that in K- In Means++ algorithms, the mutual distance between initial cluster centre is remote as far as possible, and algorithm steps are as follows：

Step 2：For each w in data set, it and nearest cluster centre (referring to the cluster centre selected) are calculated Distance D (w) is simultaneously stored in an array, then adds up these distances to obtain Sum (D (w))；

Step 3：Select a new c_iAs cluster centre, with probabilitySelect c_i=w' ∈ W, i.e. D (w') are larger Point, be selected larger as the probability of cluster centre；

3. the compression method according to claim 2 based on the deep neural network for improving cluster, which is characterized in that in K- In Means++ algorithms, the algorithm realization of step 3 is as follows：A random value Random that can be fallen in Sum (D (w)) is first taken, so Random-=D (w) is used afterwards, and until its value is less than or equal to 0, point at this time is exactly next cluster centre, value in experiment Random=Sum (D (w)) * λ, λ ∈ (0,1)；As algorithm value Random=Sum (D (w)) * λ, which can be with larger general Rate is fallen into the larger sections D (w), so corresponding point w₃It can be selected as new cluster centre using larger probability；

After being clustered to every layer of weight by K-Means++ algorithms, indicate that original weighted value is realized by cluster centre value Weights are shared, and same weights are shared in multiple connections of same layer, and the weights of cross-layer are not shared, and reduce the quantity of weights in this way, then The secondary compression for realizing deep neural network.