CN114037051A

CN114037051A - A Deep Learning Model Compression Method Based on Decision Boundary

Info

Publication number: CN114037051A
Application number: CN202111242448.0A
Authority: CN
Inventors: 董航程; 刘国栋; 刘炳国; 叶东; 廖敬骁
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-02-11

Abstract

The invention discloses a deep learning model compression method based on decision boundary, which belongs to the technical field of deep learning model compression. The deep learning model compression method based on the decision boundary includes the following steps: step 1, perform feature mapping; step 2, perform piecewise linearization of the activation function; step 3, perform sub-decision region calculation: calculate the sub-decision region of the fully connected layer; step 4. Constructing a decision network: Calculate the corresponding decision boundary according to the sub-decision area, and use it to construct a new decision network. The present invention realizes efficient model compression for the fully connected layer, and for the model whose activation function is piecewise linear, compared with the prior method, the present invention often brings about the problem of reduced accuracy, and the present invention can realize lossless compression of accuracy. For other nonlinear functions whose activation functions are infinite asymptotes, model compression with controllable accuracy can be achieved.

Description

Deep learning model compression method based on decision boundary

Technical Field

The invention relates to a deep learning model compression method based on decision boundaries, and belongs to the technical field of deep learning model compression.

Background

The deep learning model is a core algorithm of the existing artificial intelligence technology, depends on a large amount of labeled data, and realizes nonlinear fitting on complex problems through hierarchical modeling. In current practice, deep learning techniques have been successful in the fields of image recognition, speech processing, etc., and have continuously affected other industries.

In order to process complex data, current deep learning models often have hundreds of millions of parameters, and besides a lot of time and computing resources are consumed in a training phase, a lot of storage resources are occupied in deployment and inference processes of the models, and inference speed is slow. In the case of limited computing resources, such as mobile terminals, the application of the deep learning system will be limited.

The deep learning model compression mainly aims at the problem of excessive model parameter quantity, and currently, research on the field mainly focuses on the following 4 points:

(1) and (3) matrix low-rank decomposition, namely, a deep learning model relates to a large number of matrix operations, and the data volume of the matrix can be greatly reduced while the calculation result is basically unchanged by decomposing a large-scale low-rank matrix into a plurality of small matrices.

(2) Model pruning and parameter quantification: the main starting point of model pruning is that a deep learning model is often over-parameterized, so that redundant structures and parameters are contained in a network, and redundant networks are deleted through rules such as importance and the like, so that redundant parameters and neurons are deleted. Quantization is to simplify the data type stored in the weight, such as converting from floating point number to integer, so as to reduce the storage capacity. This type of approach tends to degrade the performance of the model.

(3) Network Architecture Search (NAS): and in a given model design space, a machine automatically searches an optimal structure, so that model compression is realized. Such methods can be computationally expensive in the search process.

(4) Knowledge Distillation (KD): through the trained teacher model, the student model with a smaller model is trained, so that the performance of the small model is improved while fewer model parameters are needed.

Disclosure of Invention

The invention aims to provide a deep learning model compression method based on decision boundaries, which solves the problems in the prior art.

A deep learning model compression method based on decision boundaries comprises the following steps:

step one, carrying out feature mapping;

step two, performing segmented linearization on the activation function;

step three, calculating a sub-decision area: calculating a sub-decision area of the full connection layer;

step four, decision network construction is carried out: and calculating corresponding decision boundaries according to the sub-decision regions and constructing a new decision network.

Further, in the step one, if the object of model compression is a fully-connected neural network, the step is not executed, and the step two is directly executed.

Further, in step one, if the object is a fully connected part of the cnn model, the model is regarded as a composite of two parts, where f is g_MLP(g_cnn(x₀) G) a handle_cnn(x₀) The new sample set D '═ { x' ═ g is constructed as a feature map_cnn(x) And then operate it as a fully connected neural network.

Further, in the second step, if the activation function adopts a piecewise linear function, the third step is directly executed without executing the second step.

Further, in the second step, for the activation function which is not a piecewise linear function, the activation function piecewise linearization technique is adopted, and the piecewise linear function close to the activation function is found to perform approximate substitution and is converted into the piecewise linear function.

Further, in step two, as for the activation function that is not a piecewise linear function, specifically:

first, a hard approximation function hard- σ (x) of the activation function σ (x) is generated, specifically as follows:

depending on the required number of segments L n +2 and the acceptable error δ > 0, two segmentation points are first selected so as to be at (- ∞, a)₀]，[a_nAnd +∞) in two intervals, satisfying that [ sigma (x) -hard-a (x) < delta ], and in the interval [ a ≦₀，a_n]Upper, directly and equidistantly taking the division point a₁，a₂，...，a_n-1And according to the point pair (a)₁，hard-σ(a₁))，(a₂，σ(a₂))，(a₃，σ(a₃))，...，(a_n-2，σ(a_n-2))，(a_n-1，hard-σ(a_n-1) And connecting the L (n + 2) sections of the original activation function in sequence to obtain the piecewise linear approximation function of the original activation function.

Further, in step three, the activation function is not a piecewise linear function, specifically: the invention firstly calculates the decision boundary, concretely, the used piecewise linear activation function is:

step three, traversing samples according to a training sample set, namely inputting each sample into a deep learning model f (x) in sequence, but not executing a back propagation process, and simultaneously recording the activation states of all full-connection layer activation functions;

step three, counting all full connectionsThe activation states of the layer neurons are sequentially arranged into an overall state vector S ═ S₁，s₂，...，s_m]According to the steps in the third step and the first step, the integral state vectors of all the samples are counted to obtain an integral state vector set phi of the samples, wherein phi is { S ═ S }₁，S₂，...S_N}；

Thirdly, finishing phi, and combining the completely same integral state vectors to obtain

From the reformed ensemble of state vectors

The number q of elements of (1) will have the same activation state S'_pDividing samples of (p is more than or equal to 1 and less than or equal to q) into the same subinterval, and classifying the samples belonging to the same subinterval by the same vector linear model g_i(x)＝w_ix+b_i(i ═ 1, 2.. q.) description, directly through the parameters of the full connection layer and the overall activation state vector, the equivalent linear model g is obtained by calculation_i(x)＝w_ix+b_iLet all submodels be G ═ G₁，g₂，...，g_q}；

Step three and four, calculating decision boundaries of all sub models, and obtaining the N classification problems according to the definition of the decision boundaries

Class decision boundaries, and linear model g for a sub-interval_i(x) Is calculated to obtain

The bar decision boundary specifically includes:

calculating decision boundaries of all subinterval models to form a decision boundary hyperplane setCombination of Chinese herbs

Further, in step four, specifically, a decision network is constructed according to the decision boundary hyperplane set DB obtained in step three, the network only contains a hidden layer, which is different from a general neural network, and the output of the decision network DNet is a position code relative to the decision boundary, which is recorded as 0/1, specifically, for the hyperplane P_lAnd sample x₀Directly substituting into a hyperplane formula to calculate the output, if the result is regular, marking 1, and if the result is negative, marking 0,

through the decision network, a relative position coding of the data with respect to all elements of the decision boundary set DB is obtained

Depending on the nature of the decision boundary, samples with the same position code must belong to the same class,

and traversing the training set data D ═ x_i，C_iI | (1, 2.), (N }), the class marking of the position code of the decision network is completed, and when a new sample is input, the class of the new sample can be known only by comparing the new sample with the marked position code.

The invention has the following beneficial effects:

1. the invention realizes the high-efficiency model compression of the full connection layer.

2. Compared with the prior art, the method has the advantage that the precision is reduced for the model with the piecewise linear activation function, and the precision lossless compression can be realized. For other non-linear functions where the activation function is an infinite asymptote, model compression with controllable accuracy can be achieved.

Drawings

Fig. 1 is a schematic diagram of a decision network.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a deep learning model compression method based on decision boundaries, which comprises the following steps:

step one, carrying out feature mapping;

step two, performing segmented linearization on the activation function;

Further, in step one, if the object is a fully connected part of the cnn model, the model is regarded as a composite of two parts, where f is g_MLP(g_cnn(x₀) G) a handle_cnn(x₀) The new sample set D '═ { x' ═ g is constructed as a feature map_cnn(x) And then the original model f is operated as a fully connected neural network.

Further, in step two, for the function whose activation function is not piecewise linear, such as sigmoid, tanh function, etc., the activation function piecewise linearization technique can be adopted to perform approximate substitution by finding the piecewise linear function close to the activation function. The method comprises the following specific steps:

according to the existing activation function, which generally has the property of an infinite asymptote, according to the infinite asymptote of the activation function, a hard approximation function hard- σ (x) of the activation function σ (x) is firstly generated, specifically as follows:

depending on the required number of segments L n +2 and the acceptable error δ > 0, two segmentation points are first selected so as to be at (- ∞, a)₀]，[a_nAnd +/-infinity), satisfying that the absolute value of sigma (x) -hard-sigma (x) is less than or equal to delta. And in the interval [ a₀，a_n]Upper, directly and equidistantly taking the division point a₁，a₂，...，a_n-1And according to the point pair (a)₁，hard-σ(a₁))，(a₂，σ(a₂))，(a₃，σ(a₃))，...，(a_n-2，σ(a_n-2))，(a_n-1，hard-σ(a_n-1) And connecting the L-n +2 sections of the original activation function in sequence to obtain the piecewise linear approximation function of the original activation function. Then the decision network can be generated according to the process of the first to fourth steps, thereby realizing model compression.

Further, in step three, for classifying the model, its essence is the block of the modelRule boundary, given a data set D ═ x, for example, by image classification_i，C _i1, 2.., N }, training a classifier, f: r^d→R^cWherein the classification label is C ═ { C ═ C_i/i＝1，2，...，N，N∈Z⁺}. F is at C_iAnd C_jThe decision boundaries between classes are:

where U (x, δ) is the sphere opening neighborhood for sample x.

The deep learning model for solving the classification problem is also a classifier f (x), so the decision boundary of the deep learning model is calculated firstly, and the decision boundary of the deep learning model is generally difficult to calculate due to high nonlinearity. In particular, note that the piecewise linear activation function used is:

step three, traversing the samples according to a training sample set, namely inputting each sample into a deep learning model f (x) in sequence, but not executing a back propagation process (namely only performing an inference process), and simultaneously recording the activation states of all full-connection layer activation functions, such as the output a of a certain neuron_ijSatisfies mu_k＜a_ij＜μ_k+1(k-0, 1, 2.., n-1), the activation state of the neuron is s_ijK, and so on;

step three and two, counting the activation states of all neurons in the full connecting layer, and sequentially arranging the activation states into an overall state vector S ═ S₁，s₂，...，s_m]According to the steps in the third step and the first step, the integral state vectors of all the samples are counted to obtain an integral state vector set phi of the samples, wherein phi is { S ═ S }₁，S₂，...S_N}；

Thirdly, finishing phi, and combining the completely same integral state vectors to obtain the final productTo

From the reformed ensemble of state vectors

Step three and four, calculating decision boundaries of all sub models, and knowing that N classification problems are shared according to the definition of the decision boundaries

Class decision boundaries, and linear model g for a sub-interval_i(x) Can be calculated to obtain

The bar decision boundary specifically includes:

calculating decision boundaries of all subinterval models to form decision boundary hyperplane set

Further, in step four, specifically, the decision boundary hyperplane set DB obtained in step three is used to constructDecision network (DNet) comprising only one hidden layer, different from the ordinary neural network, the output of the decision network (DNet) is a position code relative to the decision boundaries, denoted 0/1, in particular for hyperplane P_lAnd sample x₀And directly substituting into a hyperplane formula to calculate the output of the hyperplane formula, and if the result is regular, marking 1, and if the result is negative, marking 0. Thus, by means of the decision network, a relative position coding of the data with respect to all elements of the decision boundary set DB is obtained

Referring to fig. 1, samples having the same position code must belong to the same class according to the characteristics of the decision boundary.

Therefore, only the training set data D ═ x needs to be traversed again_i，C_iI 1, 2., N }, the location code of the decision network may be category-labeled. When a new sample is input, the type of the sample can be known only by comparing the new sample with the marked position code.

The invention provides a deep learning model (including CNN and MLP) compression method based on decision boundary, because the parameters of the deep learning model are largely derived from the full connection layer in the model, the method designs a compression method for the full connection layer without a large amount of experiments such as pruning, searching, distilling and the like, and only needs to traverse 2 times on a training set sample. And the obtained model can realize lossless compression if the activation function is a piecewise linear function commonly used as a ReLU, and can realize compression with any given precision through linear approximation if the activation function is other nonlinear activation functions.

The above embodiments are only used to help understanding the method of the present invention and the core idea thereof, and a person skilled in the art can also make several modifications and decorations on the specific embodiments and application scope according to the idea of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. a deep learning model compression method based on decision boundary, is characterized in that, described deep learning model compression method based on decision boundary comprises the following steps:

Step 1. Perform feature mapping;

Step 2: Perform piecewise linearization of the activation function;

Step 3: Calculate the sub-decision area: Calculate the sub-decision area of the fully connected layer;

Step 4: Construct a decision network: Calculate the corresponding decision boundary according to the sub-decision area, and use it to construct a new decision network.

2. a kind of deep learning model compression method based on decision boundary according to claim 1, is characterized in that, in step 1, if the object of model compression is fully connected neural network, then do not execute this step, directly execute step two.

3. a kind of deep learning model compression method based on decision boundary according to claim 1 is characterized in that, in step 1, if the object is the fully connected part of the cnn model, then the model is regarded as the composite of the two parts f=g _MLP (g _cnn (x ₀ )), regard g _cnn (x ₀ ) as a feature map, construct a new sample set D′={x′=g _cnn (x)}, and then use it as a full sample set D′={x′=g cnn (x)} Connect the neural network to operate.

4. A decision boundary-based deep learning model compression method according to claim 1, wherein in step 2, if the activation function adopts a piecewise linear function, this step is not performed, and step 3 is directly performed.

5. a kind of deep learning model compression method based on decision boundary according to claim 1, is characterized in that, in step 2, for activation function not piecewise linear function, adopt activation function piecewise linearization technique, pass Find a piecewise linear function similar to the activation function for approximate substitution and convert it into a piecewise linear function.

6. a kind of deep learning model compression method based on decision boundary according to claim 5, is characterized in that, in step 2, for activation function is not the function of piecewise linear, concrete:

First, the hard approximation function hard-σ(x) of the activation function σ(x) is generated, as follows:

According to the required number of segments L=n+2 and the acceptable error δ>0, first select two segment points, so that on the two intervals (-∞, a ₀ ], [an , ₊ ∞) , satisfies |σ(x)-hard-σ(x)|≤δ, and on the interval [a ₀ , a _n ], directly equidistantly take the dividing points a ₁ , a ₂ ,..., a _n-1 , and according to point pairs (a ₁ , hard-σ(a ₁ )), (a ₂ , σ(a ₂ )), (a ₃ , σ(a ₃ )), …, (a _n-2 , σ( a _n-2 )), (a _n-1 , hard-σ(a _n-1 )), connect the lines in turn, that is, the L=n+2 segment piecewise linear approximation function of the original activation function is obtained.

7. a kind of deep learning model compression method based on decision boundary according to claim 1, is characterized in that, in step 3, for activation function is not the function of piecewise linear, concrete: the present invention first calculates its decision boundary , specifically, the piecewise linear activation function used is:

Step 31: First, traverse the samples according to the training sample set, that is, input each sample into the deep learning model f(x) in turn, but do not perform the backpropagation process, and record the activation states of all fully connected layer activation functions at the same time;

Step 32: Count the activation states of all neurons in the fully connected layer, and arrange them in sequence as an overall state vector S=[s ₁ , s ₂ , ..., s _m ]. According to the steps in step 31, count all the The overall state vector of the sample is obtained to obtain the overall state vector set of the sample φ={S ₁ , S ₂ ,...S _N };

Step 33: Arrange φ, and combine the exact same overall state vector to get

According to the overall state vector set after reformation

The number of elements q of , the samples with the same activation state S′ _p (1≤p≤q) are classified into the same sub-interval, and the samples belonging to the same sub-interval are divided by the same vector linear model g _i (x)=w Described by _i x+b _i (i=1, 2,..., q), directly through the parameters of the fully connected layer and the overall activation state vector, the equivalent linear model g _i (x)=w _i x is obtained by calculation +b _i , denote all sub-models as G={g ₁ , g ₂ , ..., g _q };

Step 34: Calculate the decision boundary of all sub-models. According to the definition of the decision boundary, it is known that for the N classification problem, there are a total of

class decision boundary, and for a linear model g _i (x) over a subinterval, the computation yields

decision boundary, specifically:

Calculate the decision boundaries of all subinterval models to form a set of decision boundary hyperplanes

8. a kind of deep learning model compression method based on decision boundary according to claim 1, is characterized in that, in step 4, specifically, according to the decision boundary hyperplane set DB obtained in step 3, construct decision-making network, The network only contains one hidden layer, which is different from the ordinary neural network. The output of the decision network DNet is the position code relative to the decision boundary, denoted as 0/1. Specifically, for the hyperplane P _l and the sample x ₀ , Bring it directly into the hyperplane formula to calculate its output, if the result is positive, record 1, if it is negative, record 0,

Through the decision network, the relative position encoding of the data relative to all elements of the decision boundary set DB is obtained

According to the characteristics of the decision boundary, samples with the same location code must belong to the same class,

Traverse the training set data D={x _i , C _i |i=1, 2, ..., N}, that is, the category labeling of the position encoding of the decision-making network is completed. When a new sample is input, only the labeling is required. A good location code can be compared to know its category.