CN113610227A

CN113610227A - Efficient deep convolutional neural network pruning method

Info

Publication number: CN113610227A
Application number: CN202110838976.6A
Authority: CN
Inventors: 谢雪梅; 石光明; 杨建朋; 汪振宇
Original assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou; Guangzhou Institute of Technology of Xidian University
Current assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou; Guangzhou Institute of Technology of Xidian University
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-11-05
Anticipated expiration: 2041-07-23
Also published as: CN113610227B

Abstract

The invention discloses a high-efficiency deep convolutional neural network pruning method, which mainly solves the problem that the existing deep convolutional neural network consumes large storage resources and calculation resources, and the implementation scheme is as follows: optimizing a scaling factor by a sparse learning method based on an ADMM algorithm, and training a deep convolutional neural network to make the network structure sparse; searching the cutting rate suitable for each layer of the trained deep convolutional neural network by using a genetic algorithm, and automatically searching the optimal cutting rate meeting the requirement under the guidance of a fitness function; and (4) cutting each layer of the network after sparse learning training by using the optimal cutting rate to obtain the convolutional neural network with the optimal efficiency. The invention can greatly reduce the precision loss of the convolutional neural network after pruning, greatly reduces the consumption of the convolutional neural network on storage resources and calculation resources by reducing the parameter quantity of the network, and can be used for compressing the deep convolutional neural network.

Description

Efficient deep convolutional neural network pruning method

Technical Field

The invention belongs to the technical field of computers, and mainly relates to a high-efficiency pruning method for a deep convolutional neural network, which can be used for compressing the deep convolutional neural network.

Background

In recent years, a neural network technology has a good effect in scientific research and practical application, but compared with a traditional algorithm, a large amount of storage resources and calculation resources are consumed in the calculation process of the neural network, so that the power consumption and the cost are high when the neural network is deployed, and the use of the neural network on mobile terminal equipment with limited power consumption is limited. The neural network pruning is used as a method for compressing the neural network, and the storage consumption and the calculation consumption of the neural network are reduced by removing redundant components in the neural network, so that the aim of reducing the power consumption during the operation of the neural network is fulfilled.

Currently, neural network pruning methods are divided into two main categories, unstructured pruning and structured pruning.

The unstructured pruning method achieves the purpose of compressing the neural network by arbitrarily removing the non-important weight in the neural network, and although the method can obtain high compression rate, the original data structure of the neural network is damaged because the position of the removed weight is disordered, and special equipment is also needed to store the clipped weight parameters, so that the application of the method on general equipment is limited.

The structured pruning method achieves the purpose of compressing the neural network by removing the non-important channels or convolution kernels in the neural network, and the removal of the channels cannot damage the data structure of the neural network, so the method can be well applied to the existing computing equipment. However, the structured pruning method has inaccurate positioning of redundant components in the neural network, low compression ratio of the neural network and great influence on the performance of the neural network.

In addition, most of the neural network models after pruning are trained again by using the unstructured method and the structured method, which consumes a lot of time and has low efficiency.

Disclosure of Invention

The invention aims to provide an efficient deep convolution neural network pruning method aiming at the defects of the prior art, so that under the condition that a model after pruning is not trained any more, good compression rate and precision are ensured, and pruning efficiency is improved.

The technical idea of the invention is as follows: training of a convolutional neural network is restrained and regulated through a sparse learning method based on an ADMM algorithm, so that the network structure is sparse; carrying out heuristic search on the appropriate cutting rate of each layer of the trained convolutional neural network by utilizing a genetic algorithm, and automatically searching out the optimal cutting rate meeting the requirement under the guidance of a fitness function; and (4) cutting the network through the optimal cutting rate to obtain the optimal convolutional neural network. The method comprises the following implementation steps:

(1) training a deep convolutional neural network by using an ADMM-based sparse learning method:

(1a) introducing a scaling factor gamma to each channel of a deep convolutional neural network_l,i；

(1b) Adding 0-norm regular terms of scaling factors in each channel to the loss function loss of the deep convolutional neural network to obtain a new loss function loss_new；

(1c) Downloading training data set from public data website, training parameters except scaling factor in neural network by using the training data set and stochastic gradient descent algorithm, and using ADMM algorithm to apply scaling factor gamma_l,iOptimizing until the new loss function in the step (1b) converges to obtain a trained deep convolution neural network model;

(2) searching the optimal sub-clipping rate in the convolutional neural network after sparse learning training by using a genetic algorithm:

(2a) setting the maximum iteration times of the genetic algorithm, and calculating the total parameter B of the trained convolutional neural network model₀And the calculated quantity D₀；

(2b) Initializing M groups of cutting rates, wherein each group comprises N different cutting rates, and N is equal to the number of network layers;

(2c) respectively coding each group of cutting rates into binary codes, and performing cross and variation operation on the binary codes to generate a plurality of new binary codes, wherein the sum of the number of the newly generated binary codes and the number of the original binary codes is P;

(2d) decoding the P binary codes in (2c) and decoding each binary code into a set of clipping rates P_i,j；

(2e) Adjusting the network model according to the clipping rate generated in the step (2d), and selecting channels with smaller scaling factor values in each layer of the convolutional neural network to enable the ratio of the number of the selected channels in each layer of the convolutional neural network to the total number of the channels in the layer to be equal to the clipping rate;

(2f) deleting the selected channel in the step (2e) from the convolutional neural network to obtain P sub-networks, wherein each sub-network corresponds to one group of cutting rates;

(2g) calculating an accuracy a for each sub-network in (2f)₀Parameter b₀And the calculated quantity d₀And utilize a₀，b₀And d₀Calculating the fitness f of each sub-network_i；

(2h) Fitness f by each sub-network_iCalculate the probability q that each network is selected_iScreening R sub-networks from the P sub-networks obtained in the step (2f) by a roulette wheel selection method to obtain the current cutting rate P corresponding to each sub-network in the step (2d)_i,j，1≤R≤P；

(2i) Repeating the processes from (2b) to (2h), and when the iteration times reach the maximum iteration times set in (2a), finishing the search to obtain the optimal sub-cutting rate p of the convolutional neural network model corresponding to the optimal fitness_best；

(3) Selecting omega channels with smaller scale factor value in each layer of the convolutional neural network to ensure that the ratio of the number omega of the selected channels in each layer of the convolutional neural network to the total number beta of the channels in the layer

Equal to the optimal sub-clipping rate p_bestTo complete the cutting of the convolution neural network model to obtain the best resultA good convolutional neural network.

The invention has the following advantages:

1. low loss of precision

The deep convolutional neural network is trained by using the sparse learning method based on the ADMM algorithm, the increase of redundant parameter values in the convolutional neural network is inhibited, the influence of parameters on network performance is reduced, the pruning rate is finely screened by using the genetic algorithm, the influence of network model pruning on network precision is further reduced, and the precision loss caused by the network model pruning is greatly reduced.

2. High pruning efficiency

According to the invention, by utilizing the sparse learning method based on the ADMM algorithm and the heuristic search method based on the genetic algorithm, parameter coarse screening and parameter fine screening are respectively carried out at the training and pruning stages of the convolutional neural network, so that the precision loss caused by pruning is greatly reduced, and the pruned network obtained by the invention can have almost the same performance as the original network without retraining, thereby saving the retraining time and improving the pruning efficiency.

3. Adjustable pruning

The invention utilizes the genetic algorithm to carry out heuristic search on the cutting rate suitable for the convolutional neural network, changes the search direction by changing the calculation mode of fitness, and further changes the final search result, so that the invention can adjust pruning according to the actual requirement.

Simulation results show that the deep convolutional neural network can be efficiently compressed and extremely small precision loss is generated. In the classification task based on CIFAR-10 and ImageNet data sets, the average compression rate of the parameters of the convolutional neural network reaches 68.0 percent, the average compression rate of the calculated amount reaches 63.4 percent, and the average compression amount of the channel number reaches 63.2 percent. And the precision is equal to or even surpasses that before compression.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a diagram of simulation results of the present invention.

Detailed Description

Embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the implementation steps of this example are as follows:

step 1, training a deep convolutional neural network by using an ADMM-based sparse learning method.

The deep convolutional neural network is an existing neural network comprising N convolutional layers, wherein the input of the l layer is represented as x_lInputting x for each layer_lPerforming convolution and normalization operations, the operation set being denoted as f (·), and the output of each channel of the deep convolutional neural network being represented as:

y_l,i＝f(x_l,w_l,i,b_l,i),l＝1,2,…N；i＝1,2,...,n ＜1＞

where l is the index of the number of neural network layers, i is the index of the channel, w_l.iAnd b_l,iRespectively, weight and bias set of channels, n_lRepresenting the total number of channels at layer I in the deep convolutional neural network, f (-) representing the set of convolutional kernel normalization operations in the deep neural network, x_lAnd the input of the l layer in the neural network is represented, the N represents the total number of layers of the deep convolutional neural network, and the numerical value of the deep convolutional neural network is adjusted according to different task requirements.

The step trains the deep convolutional neural network by using an ADMM-based sparse learning method, and the method is realized as follows

(1.1) introducing a scaling factor into each channel of the deep convolutional neural network to obtain the output of each channel:

y_l,i＝γ_l,i·f(x_l,w_l,i,b_l,i),l＝1,2,...N；i＝1,2,...,n, ＜2＞

wherein, γ_l,iA scaling factor representing the ith channel of the l layer in the deep convolutional neural network; if gamma is_l,iIs 0, then the output y of the corresponding channel_l,iA value of 0 indicates that the corresponding channel is invalid and can be safely clipped.

(1.2) scaling factor γ according to (1.1)_l,iDefinition of (2) thinning of deep convolutional neural network channelsSparseness xi_cSparsity ξ using a scaling factor_sTo express that the problem of solving channel sparsity is converted into a scaling factor gamma_l,iThe constraint optimization problem of (2):

wherein E (-) denotes the loss function loss of the network and Γ denotes the scaling factor γ_l,iVector of the set, W_lAnd B_lRespectively, weight and bias set of the l-th layer, C is total number of channels, | Γ₀Is 1₀Norm, representing the number of non-zero factors.

(1.3) adding a scaling factor γ for each channel of the deep convolutional neural network to the loss function E (-) in (1.2)_l,iL of₀Norm regularization term, which transforms the constrained optimization problem into the following scaling factor gamma_l,iThe unconstrained optimization problem of (2):

and obtaining a new loss function loss according to the unconstrained optimization expression_new：

Wherein λ is a hyper-parameter, λ > 0.

(1.4) in order to solve the unconstrained optimization problem in (1.3), adding auxiliary variables into the unconstrained optimization problem, transforming by using an augmented Lagrange multiplier method, and carrying out formulation, and converting a formula < 4 > into the following formula:

where ρ is a constant, U is a dual variable for the recipe, and Z is an auxiliary variable. L Z | L₀Is 1₀Norm, representing the number of auxiliary variables.

(1.5) the equation (5) in (1.4) is decomposed into the following three subproblems:

(1.5.1) the first sub-problem is expressed as:

where k is the iteration index, Z^k-1And U^k-1The sets of auxiliary variables and dual variables in the k-1 iteration, respectively, can be considered as constants.

(1.5.2) the second sub-problem is expressed as:

wherein, gamma is^kIs the result of the first sub-problem over k iterations, U^k-1Is the dual variable for the (k-1) th iteration. Z, gamma^kAnd U^k-1All are vector representations, and the equation < 7 > can be converted to the following equation:

wherein z is_j，

And

are respectively the vectors Z, Γ^kAnd U^k-1C represents the total number of channels of the neural network, whose value is equal to the vector Z, Γ^kAnd U^k-1Length of (d);

(1.5.3) the third sub-problem is represented as:

wherein Z is^kIs the result of the second sub-problem over k iterations.

(1.6) solving the formula in (1.4) by iteratively solving the three sub-problems in (1.5):

(1.6.1) solving the first subproblem expressed by the formula < 6 > in (1.5.1) to obtain the scaling factor gamma in the kth iteration_l,iSet of (f)^k：

The first problem consists of two parts, the first part is the initial loss function loss of the convolutional neural network; the second portion is convex and differentiable. According to specific task requirements, downloading a training data set corresponding to a task from an open data website, training the problem by using the training data set and a random gradient descent algorithm until the problem is converged, and obtaining a scaling factor gamma in the (k + 1) th iteration_l,iSet of (f)^k；

(1.6.2) solving the second subproblem represented by formula < 8 > in (1.5.2), the auxiliary variable z in formula < 8 >_jTo obtain a set Z of auxiliary variables in the kth iteration^k：

If the auxiliary variable z_jIf not, then the formula < 8 > is rewritten as:

zero for the perfect square term of the formula < 10 >, i.e.

The minimum value of the formula < 10 > is obtained as lambda,

is z_jThe value at the kth iteration;

if the auxiliary variable z_jIs 0, i.e.

Is 0, then | z in equation < 8 >_j|₀The term is 0, leaving only the fully squared term, which is the most significantSmall value of

By combining the two value-taking situations,

the values of (a) are as follows:

expressing the result of the second sub-problem as

Z^kIs the set of auxiliary variables in the k-th iteration.

(1.6.3) solving a third subproblem represented by the formula < 9 > in (1.5.3) to obtain a set U of dual variables in the kth iteration^k：

Since the third sub-problem in (1.5.3) is a convex quadratic optimization problem, the result is its extreme value, which is expressed as follows:

U^k＝Γ^k-Z^k+U^k-1 ＜12＞

wherein, U^k-1For the set of dual variables in the k-1 iteration, Γ^kAnd Z^kThe results of the first and second subproblems at k iterations, respectively;

(1.7) at the k +1 th iteration, the result U of the third sub-problem in (1.6.3) at the k-th iteration is processed^kInputting to the first subproblem of the solution of (1.6.1);

(1.8) repeating the process from (1.6) to (1.7) until a new loss function loss in (1.3)_newAnd (5) converging, finishing the training of the deep convolutional network by using an ADMM-based sparse learning method, and obtaining the trained convolutional neural network.

And 2, searching the optimal sub-cutting rate in the convolutional neural network after sparse learning training by utilizing a genetic algorithm.

(2.1) setting the maximum number of iterationsCalculating the parameter B of the trained neural network₀And the calculated quantity D₀。

Where N represents the total number of layers of the neural network, N_lRepresenting the total number of channels, k, at layer I in the neural network_wAnd k_hRespectively representing the width and length of a two-dimensional convolution kernel in a neural network channel,

and

respectively representing the width and the length of an output characteristic diagram of the first layer of the neural network;

(2.2) initializing M individuals as a population of genetic algorithms, each individual having a set of channel clipping rates expressed as:

s_i＝{p_i,j|0≤p_i,j＜1,j＝1,2,...,N},i＝1,2,...,M, ＜15＞

wherein s is_iIs the i-th group cutting rate, p_i,jRepresents s_iThe channel cutting rate of the j-th layer, M represents the group number of the cutting rates, each group of cutting rates comprises N different cutting rates, and N is equal to the number of network layers;

(2.3) generating a binary code for the M sets of clipping rates initialized in (2.2), having a value of either 0 or 1, as follows:

wherein, c_iIs the ith code, and m is a hyper-parameter that determines the length of the binary code.

On the basis of binary codes represented by the expression < 16 >, a plurality of new binary codes are generated by carrying out cross and mutation operations on the binary codes, and the sum of the number of the newly generated binary codes and the number of the original binary codes is P. Wherein, crossover refers to randomly exchanging some bits between two codes, and mutation refers to changing some bits (0 to 1 or 1 to 0) in the codes;

(2.4) decoding all P codes into P group clipping rate by the inverse process for < 16 > in (2.3), expressed as follows:

wherein p is_i,jRepresents the jth clipping rate of the ith group;

(2.5) according to the clipping rate generated in (2.4), adjusting the network model, selecting the channels with smaller scaling factor values in each layer of the convolutional neural network, and enabling the ratio of the number of the selected channels in each layer of the convolutional neural network to the total number of the channels in the layer to be equal to the clipping rate p_i,j；

(2.6) deleting the selected channel in (2.5) from the neural network to obtain P sub-networks, wherein each sub-network corresponds to a group of cutting rates P_i,j；

(2.7) for different tasks, downloading corresponding test data sets from the public data website, inputting the test sets into each sub-network in (2.6), and calculating the accuracy rate d according to the network output results and the labels_i：

d_i＝φ(x_i,g)

Wherein d is_iRepresenting the accuracy, x, of the ith sub-network_iThe output result of the ith sub-network is shown, and g is the label of the test set sample.

(2.8) calculating the parameter a of each sub-network according to the formula < 13 > and the formula < 14 > in (2.1)_iAnd the calculated amount b_iAnd calculating the fitness f of the ith sub-network_i：

Wherein A is₀And B₀And D₀The accuracy, the parameter quantity and the calculated quantity of the original network are respectively; theta is a hyperparameter with the balance formula < 18 > of each part, 1 is more than or equal to theta and less than or equal to 1, and epsilon represents the allowable deviation of precision.

(2.9) fitness f obtained in (2.8)_iSumming to obtain a total fitness sum (f), and then calculating a weighted fitness q for each sub-network_i：

Where sum (f) is all sub-network fitness f_iSum of (a), q_iIs the weighted fitness of each sub-network, indicating the probability that each sub-network is selected.

(2.10) selecting a sub-network according to the probability q of each sub-network in (2.9)_iSelecting R sub-networks by a wheel selection method to obtain the corresponding cutting rate p of each sub-network at (2.6)_i,jAnd inputting the iteration into (2.2) for the next iteration, wherein R is more than or equal to 1 and less than or equal to P.

(2.11) repeating (2.2) to (2.10) until the search is finished after the maximum iteration times are reached, and obtaining the optimal sub-clipping rate p of the convolutional neural network model_best。

And step three, utilizing the optimal sub-clipping rate to clip the convolutional neural network after sparse learning training.

Selecting omega channels with smaller scale factor value in each layer of the convolutional neural network to ensure that the ratio of the number omega of the selected channels in each layer of the convolutional neural network to the total number beta of the channels in the layer

Equal to the optimal sub-clipping rate p_bestAnd deleting the selected omega channels to finish cutting the convolutional neural network model to obtain the optimal convolutional neural network, so that the calculation time consumption is longer for providing subsequent deep learning tasks and practical scene applicationShort convolutional neural network model occupying less storage resources.

The effects of the present invention are further illustrated by the following simulations.

1. Simulation experiment conditions are as follows:

the operating system of the simulation experiment is Ubuntu18.04 LTS, the deep learning software platform is Pythrch, and the GPU adopts Yingweida TITAN Xp;

the data set used in the simulation experiment of the invention comprises: CIFAR-10, ImageNet;

the neural network related to the simulation experiment of the invention comprises: vgg-16, ResNet-50, ResNet-56, ResNet-110, GoogleNet, DenseNet-40.

2. Simulation experiment contents:

simulation 1, simulation experiment for convolutional neural network training:

the CIFAR-10 data set is used for training the classification neural network Vgg-16 by respectively using a traditional training mode and the ADMM-based sparse learning training mode in the invention, and the result is shown in FIG. 2. Wherein, fig. 2(a) is a neural network feature diagram obtained by using a conventional training method; fig. 2(b) is a neural network feature diagram obtained by the ADMM-based sparse learning training method of the present invention. In fig. 2, black cells represent invalid features and white texture cells represent valid features.

As can be seen from FIG. 2, the invalid characteristic diagram obtained by the ADMM-based sparse learning training mode of the invention is obviously more than the neural network characteristic diagram obtained by the traditional training mode, which shows that the sparse learning effect of the invention on the convolutional neural network is obvious.

Simulation 2, pruning simulation experiment of the convolutional neural network:

the method is utilized to train convolutional neural networks Vgg-16, ResNet-56, ResNet-110, GoogleNet and DenseNet-40 on a training set of a CIFAR-10 data set and complete pruning;

the convolutional neural network ResNet-50 is trained and pruned on the training set of the ImageNet data set by the method.

Classifying data from the test set of the CIFAR-10 dataset using the pruned convolutional neural network Vgg-16, ResNet-56 ResNet-110, GoogleNet, DenseNet-40; the data from the test set of the ImageNet dataset was classified using the pruned convolutional neural network ResNet-50, resulting in the effect of the pruned convolutional neural network on classifying the data from both test sets, as shown in table 1.

TABLE 1 Classification Effect of the pruned convolutional neural network on test sets

The unit of count 'M' for the quantities in the table indicates 10⁶A number of floating point numbers, the count unit 'B' of the calculated amount representing 10⁹A secondary floating-point operation. The original network represents the convolutional neural network which is not pruned, and the pruned network represents the convolutional neural network which is pruned by the method.

As can be seen from Table 1, the maximum compression ratio of the invention to the parameters of the convolutional neural network is 87.8%, the minimum compression ratio is 43.0%, and the average compression ratio is 68.0%; the highest compression ratio of the calculated amount of the convolutional neural network is 83.6 percent, the lowest compression ratio is 47.9 percent, and the average compression ratio is 63.4 percent; the highest compression ratio for the convolutional neural network channel was 81.6%, the lowest compression ratio was 41.4%, and the average compression ratio was 63.2%.

The results in table 1 show that the convolutional neural network can be effectively compressed, the error rate of the convolutional neural network obtained after pruning is basically consistent with that of the original network, and the convolutional neural network can be effectively compressed and the generated precision loss is small. The compression of the convolutional neural network by the method greatly reduces the storage consumption and the calculation consumption of the convolutional neural network, thereby improving various subsequent use effects.

Claims

1. An efficient deep convolutional neural network pruning method is characterized by comprising the following steps:

(2) searching the optimal sub-clipping rate of the deep convolutional neural network model by using a genetic algorithm:

(2g) calculating an accuracy a for each sub-network in (2f)₀Parameter b₀And the calculated quantity d₀And utilize a₀，b₀And d₀ComputingFitness f of each egress subnetwork_i；

And the optimal sub-clipping rate is equal to the optimal sub-clipping rate, so that the clipping of the convolutional neural network model is completed, and the optimal sub-network of the convolutional neural network is obtained.

2. The method of claim 1, wherein: (1b) loss function loss of_newThe following configuration is implemented:

adding scaling factor gamma of each channel of deep convolution neural network to original loss function E (-) of neural network_l,iL of₀Norm regularization term, which transforms the constrained optimization problem into the following scaling factor gamma_l,iThe unconstrained optimization problem of (2):

Wherein, λ is a hyper-parameter, λ > 0, E (-) represents the loss function loss of the original network, and Γ represents the scaling factor γ_l,iVector of the set, W_lAnd B_lRespectively, the weight and bias set of the network layer/, | Γ₀Is 1₀Norm, representing the number of non-zero factors.

3. The method of claim 1, wherein: (1c) the method for optimizing the scaling factor by using the ADMM algorithm is implemented as follows:

(1c1) in order to solve the unconstrained optimization problem in the formula (1b) < 1 >, auxiliary variables are added into the unconstrained optimization problem, an augmented Lagrange multiplier method is used for transformation and formulation, and the formula < 1 > is converted into the following formula:

where Γ represents the scaling factor γ_l,iThe vector of the set, ρ is a constant, U is a dual variable for the recipe, and Z is an auxiliary variable. L Z | L₀Is 1₀A norm representing the number of auxiliary variables;

(1c2) and decomposing the formula < 3 > in the (1c1) into three subproblems and solving the subproblems in an iterative manner.

(1c3) The first sub-question is represented as:

where k is the iteration index, Z^k-1And U^k-1The sets of the auxiliary variable and the dual variable in the k-1 iteration are respectively regarded as constants;

(1c4) solving the first sub-problem represented by the formula < 4 > to obtain the scaling factor gamma in the kth iteration_l,iSet of (f)^k：

Downloading training numbers for corresponding tasks from public data web sitesTraining the problem by using the training data set and the stochastic gradient descent algorithm until the problem converges to obtain a scaling factor gamma in the kth iteration_l,iSet of (f)^k；

(1c5) The second sub-question is represented as:

wherein, gamma is^kIs the result of the first sub-problem over k iterations, U^k-1Is the dual variable for the (k-1) th iteration. Z, gamma^kAnd U^k ^-1All are vector representations, and the formula < 5 > can be converted to the following formula:

wherein z is_j，

And

(1c6) auxiliary variable z according to the formula < 6 >_jTo obtain a set Z of auxiliary variables in the kth iteration^k：

If the auxiliary variable z_jIf not, then rewrite the formula < 6 > as:

zero for the perfect square term of equation < 7 >, i.e.

The minimum value of the formula < 10 > is obtained as lambda,

is z_jThe value at the kth iteration;

if the auxiliary variable z_jIs 0, i.e.

Is 0, then | z in equation < 6 >_j|₀The term is 0, leaving only the fully squared term to have a minimum value of

The two value conditions are integrated to obtain

The values of (a) are as follows:

the result of the second subproblem with the formula < 5 > is expressed as

Z^kIs an auxiliary variable set in the k iteration;

(1c7) the third sub-problem is represented as:

wherein Z is^kIs the result of the second sub-problem over k iterations. This subproblem is a convex quadratic optimization problem, the result of which is its extremum, expressed as follows:

U^k＝Γ^k-Z^k+U^k-1，＜10＞

(1c8) at the k +1 th iteration, the result U of the third subproblem in (1c7) in the k-th iteration is calculated^kInput into the first sub-question in (1c 3);

(1c9) repeating the process from (1c3) to (1c8) until the new loss function loss in (1b)_newAnd converging, and finishing the optimization of the scaling factor by using an ADMM-based sparse learning method to obtain an optimized scaling factor set gamma.

4. The method of claim 1, wherein: (2b) the clipping rate is initialized as follows:

initializing M individuals as a population of genetic algorithms, each individual representing a set of channel clipping rates, each set comprising N different clipping rates:

s_i＝{p_i,j|0≤p_i,j＜1,j＝1,2,...,N},i＝1,2,...,M, ＜11＞

wherein s is_iIs the ith individual, p_i,jRepresents the s th_iChannel clipping rate of the j-th layer. M denotes the total number of individuals and N denotes the number of layers of the network.

5. The method of claim 1, wherein: (2g) fitness f of a computational operator network_iThe formula is as follows:

wherein, a_i，b_iAnd d_iThe accuracy, parameters and computational load of each sub-network are respectively represented. A. the₀And B₀And D₀Are these parameters of the original network. Theta is a hyper-parameter of each part of the balance formula, 1 is more than or equal to theta less than or equal to 1, and epsilon represents the allowance of precisionAnd (4) deviation.