CN111476366B - Model compression method and system for deep neural network - Google Patents

Model compression method and system for deep neural network Download PDF

Info

Publication number
CN111476366B
CN111476366B CN202010196651.8A CN202010196651A CN111476366B CN 111476366 B CN111476366 B CN 111476366B CN 202010196651 A CN202010196651 A CN 202010196651A CN 111476366 B CN111476366 B CN 111476366B
Authority
CN
China
Prior art keywords
clustering
neural network
solving
weights
optimal solution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010196651.8A
Other languages
Chinese (zh)
Other versions
CN111476366A (en
Inventor
喻文健
杨定澄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Publication of CN111476366A publication Critical patent/CN111476366A/en
Application granted granted Critical
Publication of CN111476366B publication Critical patent/CN111476366B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a model compression method and a system of a deep neural network, wherein the method comprises the following steps: initial assignment is carried out on the weight values in the neural network, and the weight values after the initial assignment are marked as m vectors W 1 ,W 2 ,…,W m Wherein, the method comprises the steps of, wherein,i=1, 2, … m; solving the optimization problem to train the neural network; for each group of weights W i Respectively carrying out scalar K-means clustering to obtain an optimal solution of the clustering problem; each group of weights W is calculated according to the optimal solution i And performing compression storage. According to the method, an optimal solution of a scalar K-means clustering problem is solved by using a dynamic programming method, and the compressed neural network can be guaranteed to well maintain the original reasoning accuracy through a clustering friendly network training process.

Description

Model compression method and system for deep neural network
Technical Field
The invention relates to the technical field of machine learning, in particular to a model compression method and system of a deep neural network.
Background
Currently, deep Neural Networks (DNNs) have achieved dramatic results in a number of tasks, including computer vision, natural language processing, and the like. However, as performance increases, the model size of DNNs is also increasing. The excessive amount of memory makes it impossible to deploy it in a "resource constrained" device, especially an edge computing device such as a face recognition system or an autopilot system on a mobile terminal.
In recent years, there have been many efforts to study compressed neural networks. These include pruning, knowledge distillation, low bit representation, design of compact network architecture and weight quantization. The low bit representation can be seen as a variant of the weight quantization that limits the weights to some low bit floating point number. Thus, an accurate weight quantization may provide an upper bound for the low bit representation.
Currently, there are some works of compressing neural networks by weight quantization, but they all use the Lloyd algorithm (S.Lloyd, "Least squares quantization in PCM," IEEE Transmission information Theory, vol.28, no.2, pp.129-137, 1982.) to solve the K-means clustering problem, but the heuristic algorithm cannot guarantee the optimal solution, and the clustering effect is sensitive to the choice of the initial solution. In addition, most of the existing methods are to train the network first and then to cluster weights through K-means clustering, and even if the clustered structure is again trained on the network, the accuracy of the network reasoning is obviously reduced.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems in the related art to some extent.
Therefore, an object of the present invention is to provide a model compression method for a deep neural network, which can obtain a large compression rate while still well maintaining the inference accuracy.
Another object of the present invention is to provide a model compression system for deep neural networks.
In order to achieve the above objective, an embodiment of an aspect of the present invention provides a method for compressing a model of a deep neural network, including the following steps: step S1, carrying out initial assignment on weights in the neural network, and marking the initially assigned weights as m vectors W 1 ,W 2 ,...,W m Wherein, the method comprises the steps of, wherein,step S2, solving an optimization problem to train the neural network; step S3, for each group of weights W i Respectively carrying out scalar K-means clustering to obtain an optimal solution of the clustering problem; in step S4 of the process,according to the optimal solution, each group of weights W i And performing compression storage.
According to the model compression method of the deep neural network, based on the weight quantification thought, the optimal solution of the scalar K-means clustering problem is solved by using a dynamic programming method, and the compressed neural network can be guaranteed to well maintain the original reasoning accuracy through a clustering friendly network training process.
In addition, the model compression method of the deep neural network according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the optimization problem in the step S2 is:
wherein w= { W 1 ,W 2 ,…,W m The ownership value of the neural network is represented by c= { C 1 ,C 2 ,…,C m The central value K after weight clustering is represented by i Representing the number of clusters of each group of vectors, C i,k Representing the presentation to beThe k-th clustering center obtained after clustering is characterized in that the hyper-parameter lambda represents balance prediction performance and clustering error, and d (x, y) represents a function of the distance between the real numbers x and y.
Further, in an embodiment of the present invention, the solving the optimization problem in step S2 further includes: step S21, initializing the central value C= { C after the weight clustering 1 ,C 2 ,…,C m -a }; step S22, solving the original problem by using an optimization algorithm to obtain the ownership value W= { W of the neural network 1 ,W 2 ,…,W m -a }; s23, changing the ownership W into a K-means clustering problem, and solving the K-means clustering problem to obtain a new central value C; step S24, judging the new central value C and all the valuesAnd if the iteration round number of the weight W meets the preset stopping condition, returning to the step S22, otherwise, ending.
Further, in one embodiment of the present invention, the step S3 further includes: step S31, setting W as a weight vector to be clustered, wherein the length is N, and N is a positive integer; step S32, sorting the weight vectors W to be clustered to obtain ordered vectors W; and step S33, clustering the ordered vector W into an optimal solution of K classes.
Further, in one embodiment of the present invention, the step S33 further includes: when k=1, bySolving the optimal solution; when K > 1, enumerate W n The interval [ i, n ] of the cluster]Repeatedly calling the step S33 to calculate the W 1 ,W 2 …W i-1 Clustering into optimal solutions of K-1 class and W i ,W i+1 …W n Clustering into class 1 optimal solutions will yield [ i, n ]]And the solution is optimal when the solution is aggregated into one type.
To achieve the above object, another embodiment of the present invention provides a model compression system for a deep neural network, including: the initialization module is used for carrying out initial assignment on the weight values in the neural network and marking the weight values after the initial assignment as m vectors W 1 ,W 2 ,…,W m Wherein, the method comprises the steps of, wherein,the training module is used for solving the optimization problem to train the neural network; a cluster solving module for each group of weights W i Respectively carrying out scalar K-means clustering to obtain an optimal solution of the clustering problem; a compression storage module for storing each set of weights W according to the optimal solution i And performing compression storage.
The model compression system of the deep neural network, disclosed by the embodiment of the invention, is based on a weight quantification thought, solves the optimal solution of a scalar K-means clustering problem by using a dynamic programming method, and ensures that the compressed neural network can well maintain the original reasoning accuracy by using a clustering friendly network training process.
In addition, the model compression system of the deep neural network according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the optimization problem in the training module is:
wherein w= { W 1 ,W 2 ,…,W m The ownership value of the neural network is represented by c= { C 1 ,C 2 ,…,C m The central value K after weight clustering is represented by i Representing the number of clusters of each group of vectors, C i,k Representing the presentation to beThe k-th clustering center obtained after clustering is characterized in that the hyper-parameter lambda represents balance prediction performance and clustering error, and d (x, y) represents a function of the distance between the real numbers x and y.
Further, in an embodiment of the present invention, solving the optimization problem in the training module further includes: an initializing unit, configured to initialize the clustered central value c= { C 1 ,C 2 ,…,C m -a }; the optimization solving unit is used for solving the original problem by utilizing an optimization algorithm to obtain the ownership value W= { W of the neural network 1 ,W 2 ,…,W m -a }; the clustering solving unit is used for changing the ownership value W into a K-mean clustering problem and solving the K-mean clustering problem to obtain a new central value C; and the judging unit is used for judging whether the iteration round numbers of the new central value C and the ownership value W meet a preset stopping condition, if not, returning to the optimization solving unit, otherwise, ending.
Further, in one embodiment of the present invention, the cluster solving module further includes: the setting unit is used for setting W as a weight vector to be clustered, and the length is N, wherein N is a positive integer; the sorting unit is used for sorting the weight vectors W to be clustered to obtain ordered vectors W; and the clustering unit is used for clustering the ordered vector W into an optimal solution of K classes.
Further, in an embodiment of the present invention, the clustering unit further includes: when k=1, bySolving the optimal solution; when K > 1, enumerate W n The interval [ i, n ] of the cluster]Repeatedly calling the clustering unit to calculate W 1 ,W 2 …W i-1 Clustering into optimal solutions of K-1 class and W i ,W i+1 …W n Clustering into class 1 optimal solutions will yield [ i, n ]]And the solution is optimal when the solution is aggregated into one type.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a specific storage scheme before and after weight quantization;
FIG. 2 is a flow chart of a method of model compression of a deep neural network according to one embodiment of the invention;
fig. 3 is a schematic diagram of a model compression system of a deep neural network according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
The embodiment of the invention uses a network scheme with the weight quantized to store the network. For n weights, if the weights are represented by single-precision floating point numbers in the prior art without considering the quantization of the weights, 32n binary bits are required to be stored. After the embodiment of the invention is adopted, n weights are clustered into K classes, and only the weights of each class are stored first and then the weights are usedThe binary bits are in which class, which is all that is needed +.>And storing the binary bits. The compression ratio brought by the weight quantization is +.>As shown in FIG. 1, each weight in the existing storage scheme needs to be represented by 32 binary bits (single-precision floating point number); in the storage scheme after the weight quantization, each number is represented by 1 binary bit, and two single-precision floating point numbers are used for representing two different weights.
The method and system for compressing a model of a deep neural network according to the embodiments of the present invention will be described in detail with reference to the accompanying drawings.
FIG. 2 is a flow chart of a model compression method of a deep neural network according to one embodiment of the invention.
As shown in fig. 2, the model compression method of the deep neural network includes the following steps:
in step S1, initial assignment is performed on weights in the neural network, and the initially assigned weights are marked as m vectors W 1 ,W2,...,W m Wherein, the method comprises the steps of, wherein,
in step S2, the neural network is trained by solving an optimization problem.
Further, the neural network is trained by solving the following optimization problem:
wherein w= { W 1 ,W 2 ,…,W m The ownership value of the neural network is represented by c= { C 1 ,C 2 ,…,C m The central value K after weight clustering is represented by i Representing the number of clusters of each group of vectors, C i,k Representing the presentation to beThe kth clustering center obtained after clustering is characterized in that the super parameter lambda represents balance prediction performance and clustering error, d (x, y) represents a function of distance between real numbers x and y, and the smaller d (x, y) is, the closer (x, y) is, the preferable d (x, y) = (x-y) 2 Or d (x, y) = |x-y|, etc.
Further, in one embodiment of the present invention, solving the optimization problem in step S2 further includes:
step S21, initializing a central value C= { C after weight clustering 1 ,C 2 ,…,C m };
Step S22, solving the original problem by using an optimization algorithm to obtain the ownership value W= { W of the neural network 1 ,W 2 ,…,W m };
S23, changing the ownership W into a K-means clustering problem, and solving the K-means clustering problem to obtain a new central value C;
step S24, judging whether the value change condition of the new central value C and the ownership value W or the iteration round number accords with a preset stop condition, if not, returning to the step S22, otherwise, ending.
In step S3, for each set of weights W i (i=1, 2,., m) performing scalar K-means clustering, respectively, to obtain an optimal solution of the clustering problem, wherein W i The value of (1) is K i Class.
That is, the scalar K-means clustering problem is solved by adopting an algorithm based on dynamic programming for each group of weights W, so as to obtain an optimal solution, and the steps are as follows:
step S31, setting W as a weight vector to be clustered, wherein the length is N, and K represents the number to be clustered, wherein N is a positive integer;
step S32, sorting weight vectors W to be clustered to obtain ordered vectors W;
step S33, clustering the ordered vector W into an optimal solution of K classes.
Wherein, step S33 further comprises:
when k=1, bySolving an optimal solution;
when K > 1, enumerate W n The interval [ i, n ] of the cluster]Repeatedly calling step S33 to calculate W 1 ,W 2 …W i-1 Clustering into optimal solutions of K-1 class and W i ,W i+1 …W n Clustering into class 1 optimal solutions will yield [ i, n ]]The optimal solution when gathered into one class is found in all feasible [ i, n]An optimal solution is selected from the optimal solutions of (a).
In step S4, each set of weights W is determined according to the optimal solution i And performing compression storage.
That is, for W i The weight of (a) is K only i The single-precision floating point number is stored in a cluster center, and then one is used for each weightThe binary number of a bit indicates the class number to which it belongs.
It should be noted that embodiments of the present invention may be implemented in any programming language, and executed on a computing device having a CPU and memory. The following specific example uses the prior art random gradient descent method, which can be implemented by calling a numerical calculation function library of the corresponding programming language.
Consider a neural network structure comprising 1 fully connected layer, 1 convolutional layer. The full connection layer in the neural network comprises m fc X n parameters, the convolution layer contains m conv X n x h x w parameters. The embodiment of the invention divides the whole neural network parameters into m fc +m conv Group of m fc +m conv Individual vectorsThe respective clustering results form m fc +m conv Personal vector->The embodiment of the invention aims to uniformly cluster each group of parameters into K classes, so that the compression process is as follows:
for the fully connected layer, its parameters are divided into m fc Groups of n parameters each, constituteCorresponding to m fc Group cluster center->For the convolutional layer, its parameters are divided into m conv Groups of n×h×w parameters each formCorresponding to m conv Group cluster center->For i is more than or equal to 1 and less than or equal to m fc +m conv Will W i Regarding as a vector x, using the scalar K-means clustering method provided by the embodiment of the invention, the scalar K-means clustering method is clustered into K classes to obtain a clustering center vector C i
The scalar K-means clustering method provided by the embodiment of the invention needs to execute the following steps. The method clusters the input one-dimensional array x, and additionally inputs a parameter K to represent that the one-dimensional array is clustered into K classes:
step 1: let n denote the length of the one-dimensional array x.
Step 2: the array x is ordered.
Step 3: generating a two-dimensional array G of n multiplied by K in the memory, and initializing all elements of the array G to be ≡.
Step 4: an n x K two-dimensional array L is generated in memory.
Step 5: let G 0,0 =0。
Step 6: initializing variable i=1.
Step 7: initializing variable s 1 =0,s 2 =0。
Step 8: initializing variable j=i.
Step 9: calculation s 1 +x j Assigning the result to (coverage) s 1
Step 10: calculation ofAssigning the result to (coverage) s 2
Step 11: initializing variables
Step 12: initializing variable k=1.
Step 13: if G i,k >G j-1,k-1 +c, then G j-1,k-1 Assignment of +c to (cover) G i,k And assign j to (overlay) L i,k
Step 14: the variable k+1 is assigned to (overlay) k. If K is less than or equal to K, returning to the step 13 for execution.
Step 15: the variable j-1 is assigned to (overlay) j. If j is greater than or equal to 1, returning to the step 9 for execution.
Step 16: the variable i+1 is assigned to (overlay) i. If i is less than or equal to n, returning to the step 7 for execution.
Step 17: a one-dimensional array C of length K is generated in the memory.
Step 18: let l=l n,K
Step 19: variable is changedAssigning to (covering) C K
Step 20: the variable K-1 is assigned to (overlay) K, and l-1 is assigned to (overlay) n. If K > 0, return to step 18 for execution.
Step 21: at this time, the array C is the K clustering centers after clustering which are calculated by the algorithm.
Experiments on a plurality of deep learning convolutional neural networks are carried out, and the results show that the inference accuracy can be well maintained while a large compression rate is obtained. For example, embodiments of the present invention are used to compress TT-Conv networks (T.Garipov, D.Podoprikhin, A.Novikov, and D.Vetrov, "Ultimate tensorization: compressing convolutional and FC layers alike," arXiv preprint arXiv:1611.03214, 2016) comprising 6 convolutional layers and 1 fully-connected layer, and to test the effect of reasoning on image classification using CIFAR-10 dataset test networks. The result shows that the memory capacity of the weight values is reduced to about 1/10 of the original memory capacity (3 binary digits are used for representing each weight value) after the embodiment of the invention is used, and the classification accuracy is not only reduced but also improved by 0.31% compared with the network before compression. For FreshNet network model (W.Chen, J.Wilson, S.Tyree, K.Q.Weinberger, and Y.Chen, "Compressing convolutional neural networks in the frequency domain," in Proc.ACM SIGKDD,2016, pp.1475-1484.), the memory of weights is reduced to about 1/16 of the original memory (each weight is represented by 2 binary digits) by using the embodiment of the invention, and the classification accuracy of CIFAR-10 data set is only reduced by 0.57% compared with the uncompressed network.
A model compression system of a deep neural network according to an embodiment of the present invention will be described next with reference to the accompanying drawings.
FIG. 3 is a schematic diagram of a model compression system of a deep neural network according to one embodiment of the present invention.
As shown in fig. 3, the system 10 includes: initialization module 100, training module 200, cluster solution 300, and compressed storage 400.
The initialization module 100 is configured to perform initial assignment on weights in the neural network, and mark the initially assigned weights as m vectors W 1 ,W 2 ,…,W m Wherein, the method comprises the steps of, wherein,the training module 200 is used for solving the optimization problem to train the neural network.
Further, in one embodiment of the present invention, the optimization problem in training module 200 is:
wherein w= { W 1 ,W 2 ,…,W m The ownership value of the neural network is represented by c= { C 1 ,C 2 ,…,C m The central value K after weight clustering is represented by i Representing the number of clusters of each group of vectors, C i,k Representing the presentation to beThe k-th clustering center obtained after clustering is characterized in that the hyper-parameter lambda represents balance prediction performance and clustering error, and d (x, y) represents a function of the distance between the real numbers x and y.
Further, in one embodiment of the present invention, solving the optimization problem in training module 200 further comprises:
an initialization unit, configured to initialize a central value c= { C after weight clustering 1 ,C 2 ,…,C m };
The optimization solving unit is used for solving the original problem by utilizing an optimization algorithm to obtain the ownership value W= { W of the neural network 1 ,W 2 ,…,W m };
The clustering solving unit is used for changing the ownership value W into a K-mean clustering problem and solving the K-mean clustering problem to obtain a new central value C;
the judging unit is used for judging whether the iteration round number of the new central value C and the ownership value W accords with a preset stopping condition, if not, returning to the optimization solving unit, and if not, ending.
A cluster solving module 300 for each group of weights W i And respectively carrying out scalar K-means clustering to obtain the optimal solution of the clustering problem.
Further, in one embodiment of the present invention, the cluster solving module 300 further includes:
the setting unit is used for setting W as a weight vector to be clustered, and the length is N, wherein N is a positive integer;
the sorting unit is used for sorting the weight vectors W to be clustered to obtain ordered vectors W;
a clustering unit for clustering the ordered vector W into an optimal solution of K classes, when k=1, bySolving an optimal solution; when K > 1, enumerate W n The interval [ i, n ] of the cluster]Repeatedly calling the clustering unit to calculate W 1 ,W 2 …W i-1 Clustering into optimal solutions of K-1 class and W i ,W i+1 …W n Clustering into class 1 optimal solutions will yield [ i, n ]]And the solution is optimal when the solution is aggregated into one type.
A compression storage module 400 for storing each set of weights W according to the optimal solution i And performing compression storage.
According to the model compression system of the deep neural network, which is provided by the embodiment of the invention, based on a weight quantification thought, an optimal solution of a scalar K-means clustering problem is solved by using a dynamic programming method, and the compressed neural network can be ensured to well maintain the original reasoning accuracy through a clustering friendly network training process.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (4)

1. The model compression method of the deep neural network is characterized by comprising the following steps of:
step S1, carrying out initial assignment on weights in the deep neural network, and marking the initially assigned weights as m vectors W 1 ,W 2 ,…,W m Wherein, the method comprises the steps of, wherein,
step S2, solving an optimization problem to train the neural network; the neural network is trained by solving the following optimization problem:
wherein w= { W 1 ,W 2 ,…,W m The ownership value of the neural network is represented by c= { C 1 ,C 2 ,…,C m The central value K after weight clustering is represented by i Representing the number of clusters of each group of vectors, C i,k Representing the presentation to beThe kth clustering center obtained after clustering is characterized in that the super parameter lambda represents balance prediction performance and clustering error, d (x, y) represents a function of distance between real numbers x and y, and the smaller d (x, y) is, the closer (x, y) is, and d (x, y) = (x-y) is taken 2 Or d (x, y) = |x-y|;
the solving the optimization problem in step S2 further includes:
step S21, initializing a central value C= { C after weight clustering 1 ,C 2 ,…,C m };
Step S22, solving the original problem by using an optimization algorithm to obtain the ownership value W= { W of the neural network 1 ,W 2 ,…,W m };
S23, changing the ownership W into a K-means clustering problem, and solving the K-means clustering problem to obtain a new central value C;
step S24, judging whether the value change condition of the new central value C and the ownership value W or the iteration round number accords with a preset stop condition, if not, returning to the step S22, otherwise, ending;
step S3, for each group of weights W i Respectively carrying out scalar K-means clustering to obtain an optimal solution of the clustering problem; in step S3, for each set of weights W i (i=1, 2,., m) performing scalar K-means clustering, respectively, to obtain an optimal solution of the clustering problem, wherein W i The value of (1) is K i Class; respectively solving scalar K-means clustering problems for each group of weights W by adopting an algorithm based on dynamic programming to obtain an optimal solution;
the step S3 specifically includes the following steps:
step S31, setting W as a weight vector to be clustered, wherein the length is N, and K represents the number to be clustered, wherein N is a positive integer;
step S32, sorting weight vectors W to be clustered to obtain ordered vectors W;
step S33, clustering the ordered vector W into an optimal solution of K classes;
wherein, step S33 further comprises:
when k=1, bySolving an optimal solution;
when K > 1, enumerate W n The interval [ i, n ] of the cluster]Repeatedly calling step S33 to calculate W 1 ,W 2 …W i-1 Clustering into optimal solutions of K-1 class and W i ,W i+1 …W n Clustering into class 1 optimal solutions will yield [ i, n ]]The optimal solution when gathered into one class is found in all feasible [ i, n]Selecting an optimal solution from the optimal solutions of the plurality of the nodes;
step S4, according to the optimal solution, each group of weight W i Compression storage is carried out;
at this time, n weights are clustered into K classes, and only the weights of each class are stored first and then used for each weightThe binary bits are in which class, which is all that is needed +.>Storing binary bits; the compression ratio brought by the weight quantization isEach number is represented by 1 binary bit, and two single-precision floating point numbers are used for representing two different weights in addition;
acquiring an image to be classified;
and classifying the images to be classified according to the model of the deep neural network to obtain the classification result of the images to be processed.
2. A model compression storage system for a deep neural network in an image classification process, comprising:
a memory for storing a program;
a processor for executing the program stored in the memory, the processor for performing the following processes when the program stored in the memory is executed:
acquiring an image to be classified;
classifying the images to be classified according to a model of the deep neural network to obtain a classification result of the images to be processed;
wherein the apparatus further comprises:
an initialization module for performing initial assignment on the weight values in the deep neural network, and marking the weight values after initial assignment as m vectors W 1 ,W 2 ,…,W m Wherein, the method comprises the steps of, wherein,
the training module is used for solving the optimization problem to train the neural network;
the method specifically comprises the following steps: the neural network is trained by solving the following optimization problem:
wherein w= { W 1 ,W 2 ,…,W m The ownership value of the neural network is represented by c= { C 1 ,C 2 ,…,C m The central value K after weight clustering is represented by i Representing the number of clusters of each group of vectors, C i,k Representing the presentation to beThe kth clustering center obtained after clustering is characterized in that the super parameter lambda represents balance prediction performance and clustering error, d (x, y) represents a function of distance between real numbers x and y, and the smaller d (x, y) is, the closer (x, y) is, and d (x, y) = (x-y) is taken 2 Or d (x, y) = |x-y|;
the solving the optimization problem in step S2 further includes:
step S21, initializing a central value C= { C after weight clustering 1 ,C 2 ,…,C m };
Step S22, solving the original problem by using an optimization algorithm to obtain the ownership value W= { W of the neural network 1 ,W 2 ,…,W m };
S23, changing the ownership W into a K-means clustering problem, and solving the K-means clustering problem to obtain a new central value C;
step S24, judging whether the value change condition of the new central value C and the ownership value W or the iteration round number accords with a preset stop condition, if not, returning to the step S22, otherwise, ending;
the clustering solution module is used for respectively carrying out scalar K-means clustering on each group of weights Wi to obtain an optimal solution of the clustering problem; for each group of weights W i (i=1, 2,., m) performing scalar K-means clustering, respectively, to obtain an optimal solution of the clustering problem, wherein W i The value of (1) is K i Class; respectively solving scalar K-means clustering problems for each group of weights W by adopting an algorithm based on dynamic programming to obtain an optimal solution;
the cluster solving module specifically executes the following steps:
step S31, setting W as a weight vector to be clustered, wherein the length is N, and K represents the number to be clustered, wherein N is a positive integer;
step S32, sorting weight vectors W to be clustered to obtain ordered vectors W;
step S33, clustering the ordered vector W into an optimal solution of K classes;
wherein, step S33 further comprises:
when k=1, bySolving an optimal solution;
when K > 1, enumerate W n The interval [ i, n ] of the cluster]Repeatedly calling step S33 to calculate W 1 ,W 2 …W i-1 Clustering into optimal solutions of K-1 class and W i ,W i+1 …W n Clustering into class 1 optimal solutions will yield [ i, n ]]The optimal solution when gathered into one class is found in all feasible [ i, n]Selecting an optimal solution from the optimal solutions of the plurality of the nodes;
and
The compression storage module is used for carrying out compression storage on each group of weights Wi according to the optimal solution;
at this time, n weights are clustered into K classes, and only the weights of each class are stored first and then used for each weightThe binary bits are in which class, which is all that is needed +.>Storing binary bits; the compression ratio brought by the weight quantization is +.>Each number is represented by 1 binary bit, and two different weights are represented by two single-precision floating-point numbers.
3. The system of claim 2, wherein the cluster solving module is configured to perform the steps of:
step 1: let n represent the length of one-dimensional array x;
step 2: ordering the array x;
step 3: generating a two-dimensional array G of n multiplied by K in a memory, and initializing all elements of the array G to be ≡;
step 4: generating a two-dimensional array L of n multiplied by K in a memory;
step 5: let G0, 0=0;
step 6: initializing a variable i=1;
step 7: initializing a variable s1=0, s2=0;
step 8: initializing a variable j=i;
step 9: calculating s1+xj, and assigning a result to s1;
step 10: calculating and assigning the result to s2;
step 11: initializing variables
Step 12: initializing a variable k=1;
step 13: if Gi, k > Gj-1, k-1+c, then Gj-1, k-1+c is assigned to Gi, k and j is assigned to Li, k;
step 14: assigning a variable k+1 to k; if K is less than or equal to K, returning to the step 13 for execution;
step 15: assigning a variable j-1 to j; if j is more than or equal to 1, returning to the step 9 for execution;
step 16: assigning a variable i+1 to i; if i is less than or equal to n, returning to the step 7 for execution;
step 17: generating a one-dimensional array C with a length of K in a memory;
step 18: let l=ln, K
Step 19: assigning variables to CK
Step 20: assigning a variable K-1 to K, and assigning l-1 to n; if K > 0, returning to the step 18 for execution;
step 21: at this time, the array C is the K clustering centers after clustering which are calculated by the algorithm.
4. A model compression storage device for a deep neural network, comprising:
a memory for storing a program;
a processor for executing the memory-stored program, the processor for performing the method of claim 1 when the memory-stored program is executed.
CN202010196651.8A 2020-03-16 2020-03-19 Model compression method and system for deep neural network Active CN111476366B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010181452 2020-03-16
CN202010181452X 2020-03-16

Publications (2)

Publication Number Publication Date
CN111476366A CN111476366A (en) 2020-07-31
CN111476366B true CN111476366B (en) 2024-02-23

Family

ID=71747649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010196651.8A Active CN111476366B (en) 2020-03-16 2020-03-19 Model compression method and system for deep neural network

Country Status (1)

Country Link
CN (1) CN111476366B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304928A (en) * 2018-01-26 2018-07-20 西安理工大学 Compression method based on the deep neural network for improving cluster
CN108322221A (en) * 2017-01-18 2018-07-24 华南理工大学 A method of being used for depth convolutional neural networks model compression
CN108764362A (en) * 2018-06-05 2018-11-06 四川大学 K-means clustering methods based on neural network
CN109002889A (en) * 2018-07-03 2018-12-14 华南理工大学 Adaptive iteration formula convolutional neural networks model compression method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108322221A (en) * 2017-01-18 2018-07-24 华南理工大学 A method of being used for depth convolutional neural networks model compression
CN108304928A (en) * 2018-01-26 2018-07-20 西安理工大学 Compression method based on the deep neural network for improving cluster
CN108764362A (en) * 2018-06-05 2018-11-06 四川大学 K-means clustering methods based on neural network
CN109002889A (en) * 2018-07-03 2018-12-14 华南理工大学 Adaptive iteration formula convolutional neural networks model compression method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李思奇 ; .卷积神经网络算法模型的压缩与加速算法比较.信息与电脑(理论版).2019,(第11期),全文. *

Also Published As

Publication number Publication date
CN111476366A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
US20210049512A1 (en) Explainers for machine learning classifiers
Das et al. Recent advances in differential evolution–an updated survey
CN101382934B (en) Search method for multimedia model, apparatus and system
US11526722B2 (en) Data analysis apparatus, data analysis method, and data analysis program
US20200380378A1 (en) Using Metamodeling for Fast and Accurate Hyperparameter optimization of Machine Learning and Deep Learning Models
Pelikan et al. Estimation of distribution algorithms
CN107943938A (en) A kind of large-scale image similar to search method and system quantified based on depth product
US20200167659A1 (en) Device and method for training neural network
US20220147877A1 (en) System and method for automatic building of learning machines using learning machines
US20200074296A1 (en) Learning to search deep network architectures
Goyal et al. Fixed-point quantization of convolutional neural networks for quantized inference on embedded platforms
Faris et al. A genetic programming based framework for churn prediction in telecommunication industry
CN112200296A (en) Network model quantification method and device, storage medium and electronic equipment
Dupuis et al. Sensitivity analysis and compression opportunities in dnns using weight sharing
CN113627471A (en) Data classification method, system, equipment and information data processing terminal
CN109409434A (en) The method of liver diseases data classification Rule Extraction based on random forest
Ouadah et al. A hybrid MCDM framework for efficient web services selection based on QoS
Xavier-Júnior et al. A novel evolutionary algorithm for automated machine learning focusing on classifier ensembles
CN111476366B (en) Model compression method and system for deep neural network
WO2024011475A1 (en) Method and apparatus for graph neural architecture search under distribution shift
Babatunde et al. Comparative analysis of genetic algorithm and particle swam optimization: An application in precision agriculture
US11416737B2 (en) NPU for generating kernel of artificial neural network model and method thereof
CN115310594A (en) Method for improving expandability of network embedding algorithm
JP7207128B2 (en) Forecasting Systems, Forecasting Methods, and Forecasting Programs
CN114399653A (en) Fast multi-view discrete clustering method and system based on anchor point diagram

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant