CN111476366B

CN111476366B - Model compression method and system for deep neural network

Info

Publication number: CN111476366B
Application number: CN202010196651.8A
Authority: CN
Inventors: 喻文健; 杨定澄
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-03-16
Filing date: 2020-03-19
Publication date: 2024-02-23
Anticipated expiration: 2040-03-19
Also published as: CN111476366A

Abstract

The invention discloses a model compression method and a system of a deep neural network, wherein the method comprises the following steps: initial assignment is carried out on the weight values in the neural network, and the weight values after the initial assignment are marked as m vectors W ₁ ,W ₂ ,…,W _m Wherein, the method comprises the steps of, wherein,i=1, 2, … m; solving the optimization problem to train the neural network; for each group of weights W _i Respectively carrying out scalar K-means clustering to obtain an optimal solution of the clustering problem; each group of weights W is calculated according to the optimal solution _i And performing compression storage. According to the method, an optimal solution of a scalar K-means clustering problem is solved by using a dynamic programming method, and the compressed neural network can be guaranteed to well maintain the original reasoning accuracy through a clustering friendly network training process.

Description

Model compression method and system for deep neural network

Technical Field

The invention relates to the technical field of machine learning, in particular to a model compression method and system of a deep neural network.

Background

Currently, deep Neural Networks (DNNs) have achieved dramatic results in a number of tasks, including computer vision, natural language processing, and the like. However, as performance increases, the model size of DNNs is also increasing. The excessive amount of memory makes it impossible to deploy it in a "resource constrained" device, especially an edge computing device such as a face recognition system or an autopilot system on a mobile terminal.

In recent years, there have been many efforts to study compressed neural networks. These include pruning, knowledge distillation, low bit representation, design of compact network architecture and weight quantization. The low bit representation can be seen as a variant of the weight quantization that limits the weights to some low bit floating point number. Thus, an accurate weight quantization may provide an upper bound for the low bit representation.

Currently, there are some works of compressing neural networks by weight quantization, but they all use the Lloyd algorithm (S.Lloyd, "Least squares quantization in PCM," IEEE Transmission information Theory, vol.28, no.2, pp.129-137, 1982.) to solve the K-means clustering problem, but the heuristic algorithm cannot guarantee the optimal solution, and the clustering effect is sensitive to the choice of the initial solution. In addition, most of the existing methods are to train the network first and then to cluster weights through K-means clustering, and even if the clustered structure is again trained on the network, the accuracy of the network reasoning is obviously reduced.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems in the related art to some extent.

Therefore, an object of the present invention is to provide a model compression method for a deep neural network, which can obtain a large compression rate while still well maintaining the inference accuracy.

Another object of the present invention is to provide a model compression system for deep neural networks.

In order to achieve the above objective, an embodiment of an aspect of the present invention provides a method for compressing a model of a deep neural network, including the following steps: step S1, carrying out initial assignment on weights in the neural network, and marking the initially assigned weights as m vectors W ₁ ，W ₂ ，...，W _m Wherein, the method comprises the steps of, wherein,step S2, solving an optimization problem to train the neural network; step S3, for each group of weights W _i Respectively carrying out scalar K-means clustering to obtain an optimal solution of the clustering problem; in step S4 of the process,according to the optimal solution, each group of weights W _i And performing compression storage.

According to the model compression method of the deep neural network, based on the weight quantification thought, the optimal solution of the scalar K-means clustering problem is solved by using a dynamic programming method, and the compressed neural network can be guaranteed to well maintain the original reasoning accuracy through a clustering friendly network training process.

In addition, the model compression method of the deep neural network according to the above embodiment of the present invention may further have the following additional technical features:

further, in one embodiment of the present invention, the optimization problem in the step S2 is:

wherein w= { W ₁ ，W ₂ ，…，W _m The ownership value of the neural network is represented by c= { C ₁ ，C ₂ ，…，C _m The central value K after weight clustering is represented by _i Representing the number of clusters of each group of vectors, C _i，k Representing the presentation to beThe k-th clustering center obtained after clustering is characterized in that the hyper-parameter lambda represents balance prediction performance and clustering error, and d (x, y) represents a function of the distance between the real numbers x and y.

Further, in an embodiment of the present invention, the solving the optimization problem in step S2 further includes: step S21, initializing the central value C= { C after the weight clustering ₁ ，C ₂ ，…，C _m -a }; step S22, solving the original problem by using an optimization algorithm to obtain the ownership value W= { W of the neural network ₁ ，W ₂ ，…，W _m -a }; s23, changing the ownership W into a K-means clustering problem, and solving the K-means clustering problem to obtain a new central value C; step S24, judging the new central value C and all the valuesAnd if the iteration round number of the weight W meets the preset stopping condition, returning to the step S22, otherwise, ending.

Further, in one embodiment of the present invention, the step S3 further includes: step S31, setting W as a weight vector to be clustered, wherein the length is N, and N is a positive integer; step S32, sorting the weight vectors W to be clustered to obtain ordered vectors W; and step S33, clustering the ordered vector W into an optimal solution of K classes.

Further, in one embodiment of the present invention, the step S33 further includes: when k=1, bySolving the optimal solution; when K > 1, enumerate W _n The interval [ i, n ] of the cluster]Repeatedly calling the step S33 to calculate the W ₁ ，W ₂ …W _i-1 Clustering into optimal solutions of K-1 class and W _i ，W _i+1 …W _n Clustering into class 1 optimal solutions will yield [ i, n ]]And the solution is optimal when the solution is aggregated into one type.

To achieve the above object, another embodiment of the present invention provides a model compression system for a deep neural network, including: the initialization module is used for carrying out initial assignment on the weight values in the neural network and marking the weight values after the initial assignment as m vectors W ₁ ，W ₂ ，…，W _m Wherein, the method comprises the steps of, wherein,the training module is used for solving the optimization problem to train the neural network; a cluster solving module for each group of weights W _i Respectively carrying out scalar K-means clustering to obtain an optimal solution of the clustering problem; a compression storage module for storing each set of weights W according to the optimal solution _i And performing compression storage.

The model compression system of the deep neural network, disclosed by the embodiment of the invention, is based on a weight quantification thought, solves the optimal solution of a scalar K-means clustering problem by using a dynamic programming method, and ensures that the compressed neural network can well maintain the original reasoning accuracy by using a clustering friendly network training process.

In addition, the model compression system of the deep neural network according to the above embodiment of the present invention may further have the following additional technical features:

further, in one embodiment of the present invention, the optimization problem in the training module is:

Further, in an embodiment of the present invention, solving the optimization problem in the training module further includes: an initializing unit, configured to initialize the clustered central value c= { C ₁ ，C ₂ ，…，C _m -a }; the optimization solving unit is used for solving the original problem by utilizing an optimization algorithm to obtain the ownership value W= { W of the neural network ₁ ，W ₂ ，…，W _m -a }; the clustering solving unit is used for changing the ownership value W into a K-mean clustering problem and solving the K-mean clustering problem to obtain a new central value C; and the judging unit is used for judging whether the iteration round numbers of the new central value C and the ownership value W meet a preset stopping condition, if not, returning to the optimization solving unit, otherwise, ending.

Further, in one embodiment of the present invention, the cluster solving module further includes: the setting unit is used for setting W as a weight vector to be clustered, and the length is N, wherein N is a positive integer; the sorting unit is used for sorting the weight vectors W to be clustered to obtain ordered vectors W; and the clustering unit is used for clustering the ordered vector W into an optimal solution of K classes.

Further, in an embodiment of the present invention, the clustering unit further includes: when k=1, bySolving the optimal solution; when K > 1, enumerate W _n The interval [ i, n ] of the cluster]Repeatedly calling the clustering unit to calculate W ₁ ，W ₂ …W _i-1 Clustering into optimal solutions of K-1 class and W _i ，W _i+1 …W _n Clustering into class 1 optimal solutions will yield [ i, n ]]And the solution is optimal when the solution is aggregated into one type.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a specific storage scheme before and after weight quantization;

FIG. 2 is a flow chart of a method of model compression of a deep neural network according to one embodiment of the invention;

fig. 3 is a schematic diagram of a model compression system of a deep neural network according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.

The embodiment of the invention uses a network scheme with the weight quantized to store the network. For n weights, if the weights are represented by single-precision floating point numbers in the prior art without considering the quantization of the weights, 32n binary bits are required to be stored. After the embodiment of the invention is adopted, n weights are clustered into K classes, and only the weights of each class are stored first and then the weights are usedThe binary bits are in which class, which is all that is needed +.>And storing the binary bits. The compression ratio brought by the weight quantization is +.>As shown in FIG. 1, each weight in the existing storage scheme needs to be represented by 32 binary bits (single-precision floating point number); in the storage scheme after the weight quantization, each number is represented by 1 binary bit, and two single-precision floating point numbers are used for representing two different weights.

The method and system for compressing a model of a deep neural network according to the embodiments of the present invention will be described in detail with reference to the accompanying drawings.

FIG. 2 is a flow chart of a model compression method of a deep neural network according to one embodiment of the invention.

As shown in fig. 2, the model compression method of the deep neural network includes the following steps:

in step S1, initial assignment is performed on weights in the neural network, and the initially assigned weights are marked as m vectors W ₁ ，W2，...，W _m Wherein, the method comprises the steps of, wherein,

in step S2, the neural network is trained by solving an optimization problem.

Further, the neural network is trained by solving the following optimization problem:

wherein w= { W ₁ ，W ₂ ，…，W _m The ownership value of the neural network is represented by c= { C ₁ ，C ₂ ，…，C _m The central value K after weight clustering is represented by _i Representing the number of clusters of each group of vectors, C _i，k Representing the presentation to beThe kth clustering center obtained after clustering is characterized in that the super parameter lambda represents balance prediction performance and clustering error, d (x, y) represents a function of distance between real numbers x and y, and the smaller d (x, y) is, the closer (x, y) is, the preferable d (x, y) = (x-y) ² Or d (x, y) = |x-y|, etc.

Further, in one embodiment of the present invention, solving the optimization problem in step S2 further includes:

step S21, initializing a central value C= { C after weight clustering ₁ ，C ₂ ，…，C _m }；

Step S22, solving the original problem by using an optimization algorithm to obtain the ownership value W= { W of the neural network ₁ ，W ₂ ，…，W _m }；

S23, changing the ownership W into a K-means clustering problem, and solving the K-means clustering problem to obtain a new central value C;

step S24, judging whether the value change condition of the new central value C and the ownership value W or the iteration round number accords with a preset stop condition, if not, returning to the step S22, otherwise, ending.

In step S3, for each set of weights W _i (i=1, 2,., m) performing scalar K-means clustering, respectively, to obtain an optimal solution of the clustering problem, wherein W _i The value of (1) is K _i Class.

That is, the scalar K-means clustering problem is solved by adopting an algorithm based on dynamic programming for each group of weights W, so as to obtain an optimal solution, and the steps are as follows:

step S31, setting W as a weight vector to be clustered, wherein the length is N, and K represents the number to be clustered, wherein N is a positive integer;

step S32, sorting weight vectors W to be clustered to obtain ordered vectors W;

step S33, clustering the ordered vector W into an optimal solution of K classes.

Wherein, step S33 further comprises:

when k=1, bySolving an optimal solution;

when K > 1, enumerate W _n The interval [ i, n ] of the cluster]Repeatedly calling step S33 to calculate W ₁ ，W ₂ …W _i-1 Clustering into optimal solutions of K-1 class and W _i ，W _i+1 …W _n Clustering into class 1 optimal solutions will yield [ i, n ]]The optimal solution when gathered into one class is found in all feasible [ i, n]An optimal solution is selected from the optimal solutions of (a).

In step S4, each set of weights W is determined according to the optimal solution _i And performing compression storage.

That is, for W _i The weight of (a) is K only _i The single-precision floating point number is stored in a cluster center, and then one is used for each weightThe binary number of a bit indicates the class number to which it belongs.

It should be noted that embodiments of the present invention may be implemented in any programming language, and executed on a computing device having a CPU and memory. The following specific example uses the prior art random gradient descent method, which can be implemented by calling a numerical calculation function library of the corresponding programming language.

Consider a neural network structure comprising 1 fully connected layer, 1 convolutional layer. The full connection layer in the neural network comprises m _fc X n parameters, the convolution layer contains m _conv X n x h x w parameters. The embodiment of the invention divides the whole neural network parameters into m _fc +m _conv Group of m _fc +m _conv Individual vectorsThe respective clustering results form m _fc +m _conv Personal vector->The embodiment of the invention aims to uniformly cluster each group of parameters into K classes, so that the compression process is as follows:

for the fully connected layer, its parameters are divided into m _fc Groups of n parameters each, constituteCorresponding to m _fc Group cluster center->For the convolutional layer, its parameters are divided into m _conv Groups of n×h×w parameters each formCorresponding to m _conv Group cluster center->For i is more than or equal to 1 and less than or equal to m _fc +m _conv Will W _i Regarding as a vector x, using the scalar K-means clustering method provided by the embodiment of the invention, the scalar K-means clustering method is clustered into K classes to obtain a clustering center vector C _i 。

The scalar K-means clustering method provided by the embodiment of the invention needs to execute the following steps. The method clusters the input one-dimensional array x, and additionally inputs a parameter K to represent that the one-dimensional array is clustered into K classes:

step 1: let n denote the length of the one-dimensional array x.

Step 2: the array x is ordered.

Step 3: generating a two-dimensional array G of n multiplied by K in the memory, and initializing all elements of the array G to be ≡.

Step 4: an n x K two-dimensional array L is generated in memory.

Step 5: let G _0，0 ＝0。

Step 6: initializing variable i=1.

Step 7: initializing variable s ₁ ＝0，s ₂ ＝0。

Step 8: initializing variable j=i.

Step 9: calculation s ₁ +x _j Assigning the result to (coverage) s ₁ 。

Step 10: calculation ofAssigning the result to (coverage) s ₂ 。

Step 11: initializing variables

Step 12: initializing variable k=1.

Step 13: if G _i，k ＞G _j-1，k-1 +c, then G _j-1，k-1 Assignment of +c to (cover) G _i，k And assign j to (overlay) L _i，k 。

Step 14: the variable k+1 is assigned to (overlay) k. If K is less than or equal to K, returning to the step 13 for execution.

Step 15: the variable j-1 is assigned to (overlay) j. If j is greater than or equal to 1, returning to the step 9 for execution.

Step 16: the variable i+1 is assigned to (overlay) i. If i is less than or equal to n, returning to the step 7 for execution.

Step 17: a one-dimensional array C of length K is generated in the memory.

Step 18: let l=l _n，K

Step 19: variable is changedAssigning to (covering) C _K

Step 20: the variable K-1 is assigned to (overlay) K, and l-1 is assigned to (overlay) n. If K > 0, return to step 18 for execution.

Step 21: at this time, the array C is the K clustering centers after clustering which are calculated by the algorithm.

Experiments on a plurality of deep learning convolutional neural networks are carried out, and the results show that the inference accuracy can be well maintained while a large compression rate is obtained. For example, embodiments of the present invention are used to compress TT-Conv networks (T.Garipov, D.Podoprikhin, A.Novikov, and D.Vetrov, "Ultimate tensorization: compressing convolutional and FC layers alike," arXiv preprint arXiv:1611.03214, 2016) comprising 6 convolutional layers and 1 fully-connected layer, and to test the effect of reasoning on image classification using CIFAR-10 dataset test networks. The result shows that the memory capacity of the weight values is reduced to about 1/10 of the original memory capacity (3 binary digits are used for representing each weight value) after the embodiment of the invention is used, and the classification accuracy is not only reduced but also improved by 0.31% compared with the network before compression. For FreshNet network model (W.Chen, J.Wilson, S.Tyree, K.Q.Weinberger, and Y.Chen, "Compressing convolutional neural networks in the frequency domain," in Proc.ACM SIGKDD,2016, pp.1475-1484.), the memory of weights is reduced to about 1/16 of the original memory (each weight is represented by 2 binary digits) by using the embodiment of the invention, and the classification accuracy of CIFAR-10 data set is only reduced by 0.57% compared with the uncompressed network.

A model compression system of a deep neural network according to an embodiment of the present invention will be described next with reference to the accompanying drawings.

FIG. 3 is a schematic diagram of a model compression system of a deep neural network according to one embodiment of the present invention.

As shown in fig. 3, the system 10 includes: initialization module 100, training module 200, cluster solution 300, and compressed storage 400.

The initialization module 100 is configured to perform initial assignment on weights in the neural network, and mark the initially assigned weights as m vectors W ₁ ，W ₂ ，…，W _m Wherein, the method comprises the steps of, wherein,the training module 200 is used for solving the optimization problem to train the neural network.

Further, in one embodiment of the present invention, the optimization problem in training module 200 is:

Further, in one embodiment of the present invention, solving the optimization problem in training module 200 further comprises:

an initialization unit, configured to initialize a central value c= { C after weight clustering ₁ ，C ₂ ，…，C _m }；

The optimization solving unit is used for solving the original problem by utilizing an optimization algorithm to obtain the ownership value W= { W of the neural network ₁ ，W ₂ ，…，W _m }；

The clustering solving unit is used for changing the ownership value W into a K-mean clustering problem and solving the K-mean clustering problem to obtain a new central value C;

the judging unit is used for judging whether the iteration round number of the new central value C and the ownership value W accords with a preset stopping condition, if not, returning to the optimization solving unit, and if not, ending.

A cluster solving module 300 for each group of weights W _i And respectively carrying out scalar K-means clustering to obtain the optimal solution of the clustering problem.

Further, in one embodiment of the present invention, the cluster solving module 300 further includes:

the setting unit is used for setting W as a weight vector to be clustered, and the length is N, wherein N is a positive integer;

the sorting unit is used for sorting the weight vectors W to be clustered to obtain ordered vectors W;

a clustering unit for clustering the ordered vector W into an optimal solution of K classes, when k=1, bySolving an optimal solution; when K > 1, enumerate W _n The interval [ i, n ] of the cluster]Repeatedly calling the clustering unit to calculate W ₁ ，W ₂ …W _i-1 Clustering into optimal solutions of K-1 class and W _i ，W _i+1 …W _n Clustering into class 1 optimal solutions will yield [ i, n ]]And the solution is optimal when the solution is aggregated into one type.

A compression storage module 400 for storing each set of weights W according to the optimal solution _i And performing compression storage.

According to the model compression system of the deep neural network, which is provided by the embodiment of the invention, based on a weight quantification thought, an optimal solution of a scalar K-means clustering problem is solved by using a dynamic programming method, and the compressed neural network can be ensured to well maintain the original reasoning accuracy through a clustering friendly network training process.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. The model compression method of the deep neural network is characterized by comprising the following steps of:

step S1, carrying out initial assignment on weights in the deep neural network, and marking the initially assigned weights as m vectors W ₁ ,W ₂ ,…,W _m Wherein, the method comprises the steps of, wherein,

step S2, solving an optimization problem to train the neural network; the neural network is trained by solving the following optimization problem:

wherein w= { W ₁ ，W ₂ ，…，W _m The ownership value of the neural network is represented by c= { C ₁ ，C ₂ ，…，C _m The central value K after weight clustering is represented by _i Representing the number of clusters of each group of vectors, C _i，k Representing the presentation to beThe kth clustering center obtained after clustering is characterized in that the super parameter lambda represents balance prediction performance and clustering error, d (x, y) represents a function of distance between real numbers x and y, and the smaller d (x, y) is, the closer (x, y) is, and d (x, y) = (x-y) is taken ² Or d (x, y) = |x-y|;

the solving the optimization problem in step S2 further includes:

step S24, judging whether the value change condition of the new central value C and the ownership value W or the iteration round number accords with a preset stop condition, if not, returning to the step S22, otherwise, ending;

step S3, for each group of weights W _i Respectively carrying out scalar K-means clustering to obtain an optimal solution of the clustering problem; in step S3, for each set of weights W _i (i=1, 2,., m) performing scalar K-means clustering, respectively, to obtain an optimal solution of the clustering problem, wherein W _i The value of (1) is K _i Class; respectively solving scalar K-means clustering problems for each group of weights W by adopting an algorithm based on dynamic programming to obtain an optimal solution;

the step S3 specifically includes the following steps:

step S32, sorting weight vectors W to be clustered to obtain ordered vectors W;

step S33, clustering the ordered vector W into an optimal solution of K classes;

wherein, step S33 further comprises:

when k=1, bySolving an optimal solution;

when K > 1, enumerate W _n The interval [ i, n ] of the cluster]Repeatedly calling step S33 to calculate W ₁ ，W ₂ …W _i-1 Clustering into optimal solutions of K-1 class and W _i ，W _i+1 …W _n Clustering into class 1 optimal solutions will yield [ i, n ]]The optimal solution when gathered into one class is found in all feasible [ i, n]Selecting an optimal solution from the optimal solutions of the plurality of the nodes;

step S4, according to the optimal solution, each group of weight W _i Compression storage is carried out;

at this time, n weights are clustered into K classes, and only the weights of each class are stored first and then used for each weightThe binary bits are in which class, which is all that is needed +.>Storing binary bits; the compression ratio brought by the weight quantization isEach number is represented by 1 binary bit, and two single-precision floating point numbers are used for representing two different weights in addition;

acquiring an image to be classified;

and classifying the images to be classified according to the model of the deep neural network to obtain the classification result of the images to be processed.

2. A model compression storage system for a deep neural network in an image classification process, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory, the processor for performing the following processes when the program stored in the memory is executed:

acquiring an image to be classified;

classifying the images to be classified according to a model of the deep neural network to obtain a classification result of the images to be processed;

wherein the apparatus further comprises:

an initialization module for performing initial assignment on the weight values in the deep neural network, and marking the weight values after initial assignment as m vectors W ₁ ,W ₂ ,…,W _m Wherein, the method comprises the steps of, wherein,

the training module is used for solving the optimization problem to train the neural network;

the method specifically comprises the following steps: the neural network is trained by solving the following optimization problem:

the solving the optimization problem in step S2 further includes:

the clustering solution module is used for respectively carrying out scalar K-means clustering on each group of weights Wi to obtain an optimal solution of the clustering problem; for each group of weights W _i (i=1, 2,., m) performing scalar K-means clustering, respectively, to obtain an optimal solution of the clustering problem, wherein W _i The value of (1) is K _i Class; respectively solving scalar K-means clustering problems for each group of weights W by adopting an algorithm based on dynamic programming to obtain an optimal solution;

the cluster solving module specifically executes the following steps:

step S32, sorting weight vectors W to be clustered to obtain ordered vectors W;

wherein, step S33 further comprises:

when k=1, bySolving an optimal solution;

and

The compression storage module is used for carrying out compression storage on each group of weights Wi according to the optimal solution;

at this time, n weights are clustered into K classes, and only the weights of each class are stored first and then used for each weightThe binary bits are in which class, which is all that is needed +.>Storing binary bits; the compression ratio brought by the weight quantization is +.>Each number is represented by 1 binary bit, and two different weights are represented by two single-precision floating-point numbers.

3. The system of claim 2, wherein the cluster solving module is configured to perform the steps of:

step 1: let n represent the length of one-dimensional array x;

step 2: ordering the array x;

step 3: generating a two-dimensional array G of n multiplied by K in a memory, and initializing all elements of the array G to be ≡;

step 4: generating a two-dimensional array L of n multiplied by K in a memory;

step 5: let G0, 0=0;

step 6: initializing a variable i=1;

step 7: initializing a variable s1=0, s2=0;

step 8: initializing a variable j=i;

step 9: calculating s1+xj, and assigning a result to s1;

step 10: calculating and assigning the result to s2;

step 11: initializing variables

Step 12: initializing a variable k=1;

step 13: if Gi, k > Gj-1, k-1+c, then Gj-1, k-1+c is assigned to Gi, k and j is assigned to Li, k;

step 14: assigning a variable k+1 to k; if K is less than or equal to K, returning to the step 13 for execution;

step 15: assigning a variable j-1 to j; if j is more than or equal to 1, returning to the step 9 for execution;

step 16: assigning a variable i+1 to i; if i is less than or equal to n, returning to the step 7 for execution;

step 17: generating a one-dimensional array C with a length of K in a memory;

step 18: let l=ln, K

Step 19: assigning variables to CK

Step 20: assigning a variable K-1 to K, and assigning l-1 to n; if K > 0, returning to the step 18 for execution;

4. A model compression storage device for a deep neural network, comprising:

a memory for storing a program;

a processor for executing the memory-stored program, the processor for performing the method of claim 1 when the memory-stored program is executed.