CN112529165B

CN112529165B - Deep neural network pruning method, device, terminal and storage medium

Info

Publication number: CN112529165B
Application number: CN202011529589.6A
Authority: CN
Inventors: 秦豪; 赵明
Original assignee: Shanghai Yogo Robot Co Ltd
Current assignee: Shanghai Yogo Robot Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2024-02-02
Anticipated expiration: 2040-12-22
Also published as: CN112529165A

Abstract

The invention discloses a deep neural network pruning method, which comprises the following steps: constructing a full convolution depth neural network model in a mode of stacking a plurality of convolution blocks; calculating the calculation intensity of each convolution block; designing a loss function, and sparsifying and training the deep neural network to obtain network model parameters; extracting gamma values of each convolution block, and performing pruning operation; and re-integrating the deep neural network according to the Mask value to obtain a new parameter structure, and applying the new parameter structure to edge GPU calculation. According to the deep neural network pruning method provided by the invention, calculation of calculation intensity is respectively carried out on each convolution layer, a pruning penalty coefficient function is designed from the angle of calculation intensity, the pruning penalty coefficient of each layer of convolution block is positively correlated with the calculation intensity of the convolution block, the larger the calculation intensity of the convolution block is, the larger the penalty coefficient is, the calculation time consumption is obviously reduced, and the calculation speed of the edge end and the accuracy of model prediction are improved.

Description

Deep neural network pruning method, device, terminal and storage medium

[ field of technology ]

The present invention relates to the field of model pruning, and in particular, to a deep neural network pruning method, device, terminal and storage medium.

[ background Art ]

With the development of artificial intelligence technology, the popularization of deep neural networks has increased the demand for computing power on devices. For example, more and more edge devices such as robots will mount or integrate a GPU, and in general, such edge GPUs have limited computing power and bandwidth, and in some scenes with high requirements on computing instantaneity, the GPU cannot complete computing of the deep neural network in time, so that optimization of the deep neural network is required. Neural network pruning is a common technology in the field of artificial intelligence, wherein the channel pruning technology (channel pruning) is most widely applied, and the technology applies uniform constraint conditions to specific layers in the deep neural network, so that the aim of 'slimming' of the network can be achieved.

However, in the deep neural network, the required computation amount and the data access amount of the convolutional layers at different positions are different, the unified processing cannot efficiently perform "slimming" on the network, and particularly on the GPU device with limited edge resources, the deep neural network trimmed by the general channel pruning technology is often not adapted to the GPU, so that the time consumed by the computation of the deep neural network is not obviously reduced.

In view of the foregoing, it is desirable to provide a deep neural network pruning method, device, terminal and storage medium to overcome the above-mentioned drawbacks.

[ invention ]

The invention aims to provide a deep neural network pruning method, device, terminal and storage medium, which aim to solve the problem that a network cannot be efficiently thinned when different convolution layers are uniformly processed in the existing channel pruning technology, and can set weight and weighted pruning strategies according to the calculation intensity of each convolution layer, so that the performance speed of the deep neural network can be optimized, and the time consumed by calculation is obviously reduced.

In order to achieve the above objective, a first aspect of the present invention provides a deep neural network pruning method applied to edge-side GPU computing to solve some graphics processing problems with high real-time requirements, including the following steps:

constructing a full convolution depth neural network model in a mode of stacking a plurality of convolution blocks; each convolution block comprises a convolution layer, a batch normalization layer and an activation layer;

calculating the calculation intensity of each convolution block;

designing a loss function, and performing sparse training on the deep neural network to obtain network model parameters; wherein the loss function is defined as: loss=loss_ce+Σweight [ i ] |gamma [ i ] |, wherein Weight [ i ] =alpha|sqrt (integrity [ i ]/integrity_base), weight [ i ] is a penalty coefficient of the ith convolution block, i=1, 2,3,4, …, N is the number of convolution blocks, alpha is a penalty coefficient constant, integrity [ i ] is the computation Intensity of the ith convolution block, integrity_base is the computation Intensity of the last layer convolution block, loss_ce is a cross entropy Loss function, gamma [ i ] is the gamma value of the ith convolution block;

extracting the gamma value of each convolution block, and pruning according to the following formula: mask [ i ] = ||gamma [ i ] | >0.0001; wherein mask=1 represents a reserved channel; mask=0 indicates a deletion channel;

and re-integrating the deep neural network according to the Mask value to obtain a new parameter structure, and applying the new parameter structure to the edge GPU calculation.

In a preferred embodiment, the step of building the full convolution depth neural network in a manner of stacking a plurality of convolution blocks includes:

setting a training data set;

configuring a convolution layer by adopting preset attribute parameters; wherein the attribute parameters comprise convolution window, convolution span, input channel number and output channel number;

combining the convolution layer, the batch normalization layer and the activation layer into a convolution block;

and stacking a plurality of convolution blocks to construct a full-convolution deep neural network model.

In a preferred embodiment, the step of calculating the computation intensity of each convolution block includes:

calculating the calculation intensity of the convolution layer according to a calculation formula to obtain the calculation intensity of a corresponding convolution block; the calculation formula is as follows: i=tips/mes=h×w×k≡2×cin/(4×w+k≡2); wherein, flots is the calculated amount of the convolution layer, and is calculated by the formula Flots=HxW x K2 x Cin x Cout; mems is the convolutional layer throughput, calculated by the formula mems=4 (H W cout+k2 cin Cout); wherein H, W is the size of the characteristic diagram of the convolution layer output, K is the convolution window size of the convolution layer, cin is the number of convolution layer input channels, and Cout is the number of convolution layer output channels.

In a preferred embodiment, the step of sparsifying the deep neural network by the design loss function to obtain the network model parameters includes:

acquiring a sparse parameter set;

defining a cross entropy loss function; firstly, defining a cross entropy CE (Y1, Y2) =y1×log (Y2) + (1-Y1) ×log (1-Y2), and then substituting y_true and y_pred into Y1 and Y2 respectively to obtain a cross entropy Loss function loss_ce=ce (y_true, y_pred); wherein Y_true is a real data class label, and Y_pred is a class predicted by the deep neural network;

constructing a final loss function by using an L1 loss function formula; firstly, defining an L1 Loss function Loss_L1=weight gamma, wherein Weight is a gamma parameter penalty coefficient; the final Loss function is defined as loss=loss_ce+loss_l1;

and training the deep neural network model according to the final loss function, and storing network model parameters.

In a preferred embodiment, the sparse parameter set is defined by a batch normalization layer; the batch normalized layer channel i output for each convolution block is calculated from the following equation: yi=gamma_i x xi_avg+beta_i; wherein Yi is the output of a batch normalization layer channel i, xi_avg is the normalization processing value input by the channel i, and Gamma_i and beta_i are batch normalization parameters of the channel i;

in a preferred embodiment, the step of training the deep neural network model according to the final loss function and storing training model parameters includes:

training a network by adopting a public data set CIFAR10, returning the full convolution neural network in a counter-propagation mode according to a final loss function, and updating network parameters; the training mode adopts a random gradient descent method and a momentum method, the learning momentum parameter is set to be 0.9, the learning rate is polynomial descent, training is terminated after the preset times of training, and the network model parameter is saved.

A second aspect of the present invention provides a deep neural network pruning device, configured to handle some graphics processing problems with high real-time requirements, including:

the network model building module is used for building a full convolution depth neural network model in a mode of stacking a plurality of convolution blocks; each convolution block comprises a convolution layer, a batch normalization layer and an activation layer;

the calculation intensity calculation module is used for calculating the calculation intensity of each convolution block;

the sparse training module is used for designing a loss function, performing sparse training on the deep neural network and obtaining network model parameters; wherein the loss function is defined as: loss=loss_ce+Σweight [ i ] |gamma [ i ] |, wherein Weight [ i ] =alpha|sqrt (integrity [ i ]/integrity_base), weight [ i ] is a penalty coefficient of the ith convolution block, i=1, 2,3,4, …, N is the number of convolution blocks, alpha is a penalty coefficient constant, integrity [ i ] is the computation Intensity of the ith convolution block, integrity_base is the computation Intensity of the last layer convolution block, loss_ce is a cross entropy Loss function, gamma [ i ] is the gamma value of the ith convolution block;

the network model pruning module is used for extracting the gamma value of each convolution block and carrying out pruning operation according to the following formula: mask [ i ] = ||gamma [ i ] | >0.0001; wherein mask=1 represents a reserved channel; mask=0 indicates a deletion channel;

and the model integration application module is used for re-integrating the deep neural network according to the Mask value to obtain a new parameter structure and applying the new parameter structure to the edge GPU calculation.

In a preferred embodiment, the network model building module comprises:

a data set acquisition unit for setting a training data set;

the convolution layer configuration unit is used for configuring the convolution layer by adopting preset attribute parameters; wherein the attribute parameters comprise convolution window, convolution span, input channel number and output channel number;

the convolution block generation unit is used for combining the convolution layer, the batch normalization layer and the activation layer into a convolution block;

and the network model building unit is used for stacking a plurality of convolution blocks to build a full-convolution deep neural network model.

A third aspect of the present invention provides a terminal comprising a memory, a processor, and a deep neural network pruning program stored in the memory and executable on the processor, which when executed by the processor, implements the steps of the deep neural network pruning method according to any one of the embodiments above.

A fourth aspect of the present invention provides a computer-readable storage medium storing a deep neural network pruning program that, when executed by a processor, implements the steps of a deep neural network pruning method according to any one of the above-described embodiments.

According to the deep neural network pruning method provided by the invention, calculation of calculation intensity is respectively carried out on each convolution layer, a pruning penalty coefficient function is designed from the angle of calculation intensity, the pruning penalty coefficient of each layer of convolution block is positively correlated with the calculation intensity of the convolution block, and the larger the calculation intensity of the convolution block is, the larger the penalty coefficient is, namely, a weight and weighted pruning strategy is set according to the calculation intensity of each layer of convolution layer. By optimizing the convolution layers at different positions respectively, the calculation time consumption is obviously reduced, and the calculation speed of the edge end and the accuracy of model prediction are improved.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a pruning method of a deep neural network provided by the invention;

FIG. 2 is a flow chart of substep of step S101 in the pruning method of the deep neural network shown in FIG. 1;

FIG. 3 is a schematic diagram of a deep neural network model in one embodiment of the invention;

FIG. 4 is a flowchart of the substep of step S103 in the pruning method of the deep neural network shown in FIG. 1;

FIG. 5 is a frame diagram of a deep neural network pruning device;

fig. 6 is a frame diagram of a network model building module in the deep neural network pruning device shown in fig. 5.

[ detailed description ] of the invention

In order to make the objects, technical solutions and advantageous technical effects of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and detailed description. It should be understood that the detailed description is intended to illustrate the invention, and not to limit the invention.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

It should be noted that, channel pruning technique (channel pruning): namely, pruning treatment is directly carried out on channels (channels), redundant channels are removed, the pruning treatment is equivalent to 'slimming' of a network structure, and the shape of each convolution layer is uniform.

Convolution: in the field of computer vision, convolution kernels, filters are typically matrices of small size, such as 3 x 3, 5 x 5, etc., and digital images are 2-dimensional (multi-dimensional) matrices (tensors) of relatively large size, learned layer by layer through convolutional neural networks from simple to complex features (patterns).

In an embodiment of the present invention, a first aspect provides a deep neural network pruning method, which is applied to edge-side GPU (Graphics Processing Unit, graphics processor) computation, so that edge-side devices such as robots can handle some graphics processing problems with high real-time requirements through mounted or integrated micro graphics computing units GPU.

As shown in fig. 1, the deep neural network pruning method includes the following steps S101 to S105.

In step S101, a full convolution depth neural network model is constructed in a manner of stacking a plurality of convolution blocks; wherein each convolution block comprises a convolution layer, a batch normalization layer and an activation layer.

In this step, a deep neural network model of full convolution is first built. Specifically, as shown in FIG. 2, step S101 includes steps S1011-S1014.

In step S1011, a training data set is set. In this embodiment, the public dataset CIFAR-10 is used, which is a dataset with labels, for a total of 60000 color images, which are 32×32, divided into 10 classes, 6000 images each.

In step S1012, configuring a convolution layer by using preset attribute parameters; wherein the attribute parameters include convolution window, convolution span, number of input channels and number of output channels.

Specifically, the convolution layer (Convolution Layer) is a basic unit formed by the visual deep neural network, and has the properties of a convolution window (Kernel, abbreviated as K), a convolution span (Stride, abbreviated as S), the number of input and output channels (Cin, cout) and the like, and is generally combined with the batch normalization layer (Batch Normalization) and the activation layer (Relu) into a convolution block. That is, each convolution layer is configured according to a preset attribute parameter, and is denoted as a convolution layer (Cin, cout, K, S). As shown in fig. 3, the numbers in each convolution block brackets correspond to the attribute parameters in the convolution layer.

In step S1013, the convolution layer, the batch normalization layer, and the activation layer are combined to form a convolution block. Further, in some closely located channels, several convolution blocks are combined to form a convolution block set.

In step S1014, a plurality of convolution blocks are stacked to construct a full convolution deep neural network model. As shown in fig. 3, a definition example of the present embodiment is shown.

It should be noted that, the method of combining the convolution layer, the batch normalization layer and the activation layer into the convolution block and stacking the plurality of convolution blocks may refer to the prior art, and the present invention is not limited herein.

In step S102, the calculation intensity of each convolution block is calculated.

It should be noted that, when the deep neural network performs the inference calculation, the arithmetic logic unit and the memory bandwidth resource of the edge computing device are mainly occupied. The number of arithmetic logic units of the computing device directly affects the efficiency of a large number of parallel convolution operations of the deep neural network, and at the same time, the bandwidth of the memory also affects the parameter copying speed of the deep neural network. Therefore, when designing the deep neural network architecture, both aspects are considered, otherwise, the calculated amount is not large, and the actual operation speed does not reach the expected condition.

Wherein. And calculating the calculation intensity of the convolution layer according to a calculation formula to obtain the calculation intensity of the corresponding convolution block. Specifically, the calculation formula is: calculation Intensity (I, abbreviated as I), i=tips/mes=h×w×k≡2×cin/(4×w+k≡2); wherein, flots is the calculated amount of the convolution layer, and is calculated by the formula Flots=HxW x K2 x Cin x Cout; mems is the convolutional layer throughput, calculated by the formula mems=4 (H W cout+k2 cin Cout); wherein H, W is the size of the characteristic diagram of the convolution layer output, K is the convolution window size of the convolution layer, cin is the number of convolution layer input channels, and Cout is the number of convolution layer output channels.

It should be noted that, the calculated amount maps refers to the number of floating point operations generated by inputting a single sample (for example, one image) and performing one complete forward propagation on the model, that is, the time complexity of the model; the memory amount Mems refers to the total memory exchange amount occurring in the process of completing one forward propagation by inputting a single sample, namely the space complexity of the model. The calculation intensity of the convolution layer can be obtained by dividing the calculation amount by the access amount, which means how many times the memory of each Byte is exchanged to perform floating point operation in the calculation process of the convolution layer model, and the higher the calculation intensity is, the higher the memory use efficiency is.

In step S103, designing a loss function, and performing sparse training on the deep neural network to obtain network model parameters; wherein the loss function is defined as: loss=loss_ce+Σweight [ i ] |gamma [ i ] |, wherein Weight [ i ] =alpha|sqrt (Intensity [ i ]/intensity_base), weight [ i ] is a penalty coefficient of the ith convolution block, i=1, 2,3,4, …, N is the number of convolution blocks, alpha is a penalty coefficient constant, intensity [ i ] is a calculation Intensity of the ith convolution block, intensity_base is a calculation Intensity of the last layer convolution block, loss_ce is a cross entropy Loss function, and gamma [ i ] is a gamma value of the ith convolution block.

Specifically, as shown in fig. 4, step S103 includes the following steps S1031 to S1034.

In step S1031, a sparse parameter set is acquired. The sparse parameter set is defined by a batch normalization layer (BatchNormalization Layer); the batch normalized layer channel i output for each convolution block is calculated from the following equation: yi=gamma_i x xi_avg+beta_i; where Yi is the batch normalization layer channel i output, xi_avg is the normalization processing value input by channel i, and gamma_i and beta_i are both batch normalization parameters of channel i.

From the formula, in the batch normalization layer, the influence of the parameter Gamma value on the numerical value of the channel is larger, specifically, the closer the Gamma value is to zero, the smaller the influence of the channel on the prediction result of the deep neural network is, so from this point, channel pruning (channel pruning) is a method for inhibiting the Gamma value of part of channels in the deep neural network by adding L1 norm thinning training to the parameter. Specifically, it can be expressed by the following formulas of steps S1032 and S1033.

In step S1032, a cross entropy loss function (cross entropy) is defined. Firstly, defining a cross entropy CE (Y1, Y2) =y1×log (Y2) + (1-Y1) ×log (1-Y2), and then substituting y_true and y_pred into Y1 and Y2 respectively to obtain a cross entropy Loss function loss_ce=ce (y_true, y_pred); wherein Y_true is a real data class label, and Y_pred is a class predicted by the deep neural network;

in step S1033, a final loss function is constructed using the L1 loss function formula. Firstly, defining an L1 Loss function Loss_L1=weight gamma, wherein Weight is a gamma parameter penalty coefficient, and generally 0.001 is taken; the final Loss function is defined as loss=loss_ce+loss_l1. Substituting the expression of the L1 Loss function into the final Loss function and performing Weight calculation, the final Loss function is deformed to loss=loss_ce+Σweight [ i ]/gamma [ i ]/i. Wherein Weight [ i ] =alpha_sqrt (density [ i ]/density_base), weight [ i ] is a penalty coefficient of the ith convolution block, i=1, 2,3,4, …, N is the number of convolution blocks; alpha is a punishment coefficient constant, and 0.001 is taken; the Intensity [ i ] is the calculated Intensity of the ith convolution block, the intensity_base is the calculated Intensity of the last layer convolution block, the loss_ce is the cross entropy Loss function, and the gamma [ i ] is the gamma value of the ith convolution block.

Therefore, in the final loss function, the gamma parameter penalty coefficient is no longer a uniform fixed coefficient, and is determined by the computation strength of each layer of convolution block.

In step S1034, the deep neural network model is trained according to the final loss function, and network model parameters are saved.

In the step, a public data set CIFAR10 is adopted to train a network, and a full convolution neural network is returned in a counter-propagation mode according to a final loss function to update network parameters; the training mode adopts a random gradient descent method SGD (Stochastic Gradient Descent) and a momentum method, the learning momentum parameter is set to be 0.9, the learning rate is polynomial descent, the training is terminated after the training is performed for preset times (for example, 100 times), and the network model parameter is saved.

In step S104, the gamma value in the batch normalization layer of each convolution block is extracted, and pruning operation is performed according to the following formula: mask [ i ] = ||gamma [ i ] | >0.0001; wherein mask=1 represents a reserved channel; mask=0 indicates a deletion channel. Mask is a Mask, also called a Mask, that masks certain areas of the image from processing or from calculation of processing parameters, or only masks areas. Therefore, the Gamma value of part of channels in the deep neural network is restrained by screening the effective values of the Gamma values of each convolution layer, so that the calculation efficiency is improved. Meanwhile, the accuracy of graph calculation is improved by screening Gamma values with higher weighting values.

In step S105, the deep neural network is re-integrated according to the Mask value, so as to obtain a new parameter structure, and the new parameter structure is applied to edge GPU calculation.

Table 1 is a comparison of parameters calculated by the original deep neural network, the deep neural network after the general pruning operation, and the GPU at the edge end of the deep neural network in the embodiment of the present invention:

as is evident from the table data, the calculation time consumption can be obviously reduced in any pruning strategy, but the general pruning strategy reduces a large number of parameters, and meanwhile, a small loss of accuracy is brought. The pruning strategy provided by the invention is obviously superior to the common pruning strategy in terms of the calculation speed of the edge end and the model prediction accuracy although the parameter quantity is 3 times of that of the common pruning strategy.

In summary, according to the deep neural network pruning method provided by the invention, calculation of calculation intensity is performed on each convolution layer, a pruning penalty coefficient function is designed from the perspective of calculation intensity, the pruning penalty coefficient of each layer of convolution block is positively correlated with the calculation intensity of the convolution block, the greater the calculation intensity of the convolution block is, the greater the penalty coefficient is, namely, the pruning strategy of weight and weighting is set according to the calculation intensity of each layer of convolution layer. By optimizing the convolution layers at different positions respectively, the calculation time consumption is obviously reduced, and the calculation speed of the edge end and the accuracy of model prediction are improved.

The second aspect of the present invention provides a deep neural network pruning device 100, which is applied to an edge GPU. It should be noted that, the implementation principle and implementation manner of the deep neural network pruning device 100 are consistent with those of the deep neural network pruning method, so that the following description is omitted.

As shown in fig. 5, the deep neural network pruning device 100 includes:

a network model construction module 10 for constructing a full convolution depth neural network model in a manner of stacking a plurality of convolution blocks; each convolution block comprises a convolution layer, a batch normalization layer and an activation layer;

a calculation intensity calculation module 20 for calculating the calculation intensity of each convolution block;

the sparse training module 30 is used for designing a loss function, performing sparse training on the deep neural network, and obtaining network model parameters; wherein the loss function is defined as: loss=loss_ce+Σweight [ i ] |gamma [ i ] |, wherein Weight [ i ] =alpha|sqrt (integrity [ i ]/integrity_base), weight [ i ] is a penalty coefficient of the ith convolution block, i=1, 2,3,4, …, N is the number of convolution blocks, alpha is a penalty coefficient constant, integrity [ i ] is the computation Intensity of the ith convolution block, integrity_base is the computation Intensity of the last layer convolution block, loss_ce is a cross entropy Loss function, gamma [ i ] is the gamma value of the ith convolution block;

the network model pruning module 40 is configured to extract the gamma value of each convolution block, and perform pruning operation according to the following formula: mask [ i ] = ||gamma [ i ] | >0.0001; wherein mask=1 represents a reserved channel; mask=0 indicates a deletion channel;

the model integration application module 50 is configured to re-integrate the deep neural network according to the Mask value to obtain a new parameter structure, and apply the new parameter structure to edge GPU calculation.

Further, in one embodiment, as shown in fig. 6, the network model building module 10 includes:

a data set acquisition unit 11 for setting a training data set;

a convolutional layer configuration unit 12, configured to configure a convolutional layer using preset attribute parameters; wherein the attribute parameters comprise convolution window, convolution span, input channel number and output channel number;

a convolution block generation unit 13, configured to combine the convolution layer, the batch normalization layer and the activation layer into a convolution block;

a network model construction unit 14 for stacking a plurality of convolution blocks to construct a full convolution deep neural network model.

A further aspect of the present invention is to provide a terminal (not shown in the figures), the terminal comprising a memory, a processor and a deep neural network pruning program stored in the memory and executable on the processor, the deep neural network pruning program when executed by the processor implementing the steps of the deep neural network pruning method according to any of the above embodiments.

The present invention also provides a computer readable storage medium (not shown in the figure), where the computer readable storage medium stores a deep neural network pruning program, which when executed by a processor, implements the steps of the deep neural network pruning method according to any one of the above embodiments.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed system or apparatus/terminal device and method may be implemented in other manners. For example, the system or apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The present invention is not limited to the details and embodiments described herein, and thus additional advantages and modifications may readily be made by those skilled in the art, without departing from the spirit and scope of the general concepts defined in the claims and the equivalents thereof, and the invention is not limited to the specific details, representative apparatus and illustrative examples shown and described herein.

Claims

1. The deep neural network pruning method is applied to edge-side GPU calculation to solve some graphics processing problems with high real-time requirements, and is characterized by comprising the following steps of:

calculating the calculation intensity of each convolution block;

extracting the gamma value of each convolution block, and pruning according to the following formula: mask [ i ] =

Gamma [ i ] | >0.0001; wherein mask=1 represents a reserved channel; mask=0 indicates a deletion channel;

2. The deep neural network pruning method according to claim 1, wherein the step of building the full convolution deep neural network in such a manner that a plurality of convolution blocks are stacked includes:

setting a training data set;

3. The deep neural network pruning method of claim 1, wherein the calculating the calculation intensity of each convolution block includes:

4. The deep neural network pruning method as set forth in claim 1, wherein the step of sparsely training the deep neural network to obtain the network model parameters includes:

acquiring a sparse parameter set;

5. The deep neural network pruning method of claim 4, wherein the sparse parameter set is defined by a batch normalization layer; the batch normalized layer channel i output for each convolution block is calculated from the following equation: yi=gamma_i x xi_avg+beta_i; where Yi is the batch normalization layer channel i output, xi_avg is the normalization processing value input by channel i, and gamma_i and beta_i are both batch normalization parameters of channel i.

6. The deep neural network pruning method of claim 4, wherein the training the deep neural network model according to the final loss function and saving training model parameters comprises:

7. A deep neural network pruning device for processing graphics processing problems with high real-time requirements, comprising:

the sparse training module is used for designing a loss function, performing sparse training on the deep neural network and obtaining network model parameters; wherein the loss function is defined as: loss =

Loss_ce+Σ Weight [ i ] ||gamma [ i ] |, wherein Weight [ i ] =

Alpha is sqrt (Intensity i/intensity_base), weight i is a penalty coefficient of the ith convolution block, i=1, 2,3,4, …, N is the number of convolution blocks, alpha is a penalty coefficient constant, intensity i is a calculation Intensity of the ith convolution block, intensity_base is a calculation Intensity of the last layer convolution block, loss_ce is a cross entropy Loss function, gamma i is a gamma value of the ith convolution block;

and the model integration application module is used for re-integrating the deep neural network according to the Mask value to obtain a new parameter structure and applying the new parameter structure to edge GPU calculation.

8. The deep neural network pruning device of claim 7, wherein the network model building module comprises:

a data set acquisition unit for setting a training data set;

9. A terminal comprising a memory, a processor, and a deep neural network pruning program stored in the memory and executable on the processor, which when executed by the processor, implements the respective steps of the deep neural network pruning method of any one of claims 1-6.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a deep neural network pruning program, which when executed by a processor, implements the respective steps of the deep neural network pruning method according to any one of claims 1-6.