CN112529165A - Deep neural network pruning method, device, terminal and storage medium - Google Patents

Deep neural network pruning method, device, terminal and storage medium Download PDF

Info

Publication number
CN112529165A
CN112529165A CN202011529589.6A CN202011529589A CN112529165A CN 112529165 A CN112529165 A CN 112529165A CN 202011529589 A CN202011529589 A CN 202011529589A CN 112529165 A CN112529165 A CN 112529165A
Authority
CN
China
Prior art keywords
neural network
deep neural
convolution
layer
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011529589.6A
Other languages
Chinese (zh)
Other versions
CN112529165B (en
Inventor
秦豪
赵明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yogo Robot Co Ltd
Original Assignee
Shanghai Yogo Robot Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yogo Robot Co Ltd filed Critical Shanghai Yogo Robot Co Ltd
Priority to CN202011529589.6A priority Critical patent/CN112529165B/en
Publication of CN112529165A publication Critical patent/CN112529165A/en
Application granted granted Critical
Publication of CN112529165B publication Critical patent/CN112529165B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deep neural network pruning method, which comprises the following steps: constructing a full convolution depth neural network model in a mode of stacking a plurality of convolution blocks; calculating the calculation intensity of each volume block; designing a loss function, and carrying out sparse training on the deep neural network to obtain network model parameters; extracting the gamma value of each rolling block, and pruning; and (5) re-integrating the deep neural network according to the Mask value to obtain a new parameter structure, and applying the new parameter structure to the GPU calculation at the edge end. According to the deep neural network pruning method provided by the invention, the calculation intensity is respectively calculated for each convolution layer, a pruning penalty coefficient function is designed from the angle of the calculation intensity, the pruning penalty coefficient of each layer of convolution block is positively correlated with the calculation intensity of the convolution block, the greater the calculation intensity is, the greater the penalty coefficient is, the calculation time consumption is obviously reduced, and the calculation speed of the edge end and the accuracy of model prediction are improved.

Description

Deep neural network pruning method, device, terminal and storage medium
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of model pruning, in particular to a deep neural network pruning method, a deep neural network pruning device, a deep neural network pruning terminal and a storage medium.
[ background of the invention ]
With the development of artificial intelligence technology and the popularization of deep neural networks, the computational demands on equipment are increased dramatically. For example, more and more edge devices such as robots are mounted on or integrated with a GPU, and generally, such edge devices have limited computation power and bandwidth, and in some scenarios with high requirements on computation real-time performance, the GPU cannot complete computation of a deep neural network in time, and thus, the deep neural network needs to be optimized. Neural network pruning is a common technology in the field of artificial intelligence, wherein a channel pruning technology (channel prune) is most widely applied, and the technology applies uniform constraint conditions to certain specific layers in a deep neural network so as to achieve the purpose of slimming the network.
However, in the deep neural network, convolutional layers at different positions have different required calculation amounts and data access amounts, so that unified processing cannot efficiently reduce the network weight, and particularly on GPU equipment with limited edge resources, the deep neural network trimmed by a general channel pruning technology is often not adapted to the GPU, which results in that the time consumed by the deep neural network for calculation is not obviously reduced.
In view of the foregoing, it is desirable to provide a deep neural network pruning method, apparatus, terminal and storage medium to overcome the above-mentioned drawbacks.
[ summary of the invention ]
The invention aims to provide a deep neural network pruning method, a deep neural network pruning device, a deep neural network pruning terminal and a deep neural network pruning storage medium, which aim to solve the problem that the network cannot be effectively thinned when different convolutional layers are uniformly processed in the existing channel pruning technology.
In order to achieve the above object, a first aspect of the present invention provides a deep neural network pruning method applied to edge-side GPU computation, including the following steps:
constructing a full convolution depth neural network model in a mode of stacking a plurality of convolution blocks; each convolution block comprises a convolution layer, a batch normalization layer and an activation layer;
calculating the calculation intensity of each volume block;
designing a loss function, and carrying out sparse training on the deep neural network to obtain network model parameters; wherein the loss function is defined as: loss _ ce + Σ Weight [ i ] | gamma [ i ] | |, where Weight [ i ] ═ Alpha × sqrt (Intensity [ i ]/Intensity _ base), Weight [ i ] is a penalty coefficient of the ith volume block (i ═ 1,2,3,4, …, N is the number of volume blocks), Alpha is a penalty coefficient constant, Intensity [ i ] is the calculated strength of the ith volume block, Intensity _ base is the calculated strength of the last layer of volume blocks, Loss _ ce is a cross entropy Loss function, and gamma [ i ] is the gamma value of the ith volume block;
extracting the gamma value of each rolling block, and pruning according to the following formula: mask [ i ] ═ gamma [ i ] | > 0.0001; wherein Mask ═ 1 denotes the retention channel; mask 0 represents a deletion channel;
and re-integrating the deep neural network according to the Mask value to obtain a new parameter structure, and applying the new parameter structure to the GPU calculation at the edge end.
In a preferred embodiment, the step of building a full convolution depth neural network by stacking a plurality of convolution blocks includes:
setting a training data set;
configuring the convolutional layer by adopting preset attribute parameters; the attribute parameters comprise a convolution window, convolution span, input channel number and output channel number;
combining the convolutional layer, the batch normalization layer, and the activation layer into a convolutional block;
and stacking a plurality of the volume blocks to construct a fully-convoluted deep neural network model.
In a preferred embodiment, the step of calculating the calculation strength of each volume block comprises:
calculating the calculation intensity of the convolution layer according to a calculation formula to obtain the calculation intensity of the corresponding convolution block; the calculation formula is as follows: i ═ Flops/Mems ═ H ^ W ^ K ^2 ^ Cin/(4 ^ W + K ^2 ^ Cin)); wherein, Flops is the calculated amount of the convolution layer, and is calculated by a formula Flops ═ H ^ W ^ K ^2 ^ Cin ^ Cout; mems is the volume layer access amount, and can be calculated by a formula Mems ═ 4 × (H ^ W ^ Cout + K ^2 ^ Cin ^ Cout); wherein H, W is the feature size of the convolutional layer output, K is the convolutional layer convolutional window size, Cin is the convolutional layer input channel number, and Cout is the convolutional layer output channel number.
In a preferred embodiment, the designing the loss function, and performing sparse training on the deep neural network to obtain the network model parameters includes:
acquiring a sparse parameter set;
defining a cross entropy loss function; firstly, defining cross entropy CE (Y1, Y2) Y1 log (Y2) + (1-Y1) log (1-Y2), and then respectively substituting Y _ true and Y _ pred into Y1 and Y2 to obtain the cross entropy Loss function Loss _ CE (Y _ true, Y _ pred); wherein Y _ true is a real data category label, and Y _ pred is a category of deep neural network prediction;
constructing a final loss function by using an L1 loss function formula; firstly, defining a Loss function Loss _ L1 ═ Weight | | | |, gamma | |, of L1, wherein Weight is a gamma parameter penalty coefficient; the final Loss function is defined as Loss _ ce + Loss _ L1;
and training the deep neural network model according to the final loss function, and storing network model parameters.
In a preferred embodiment, the sparse parameter set is defined by a batch normalization layer; the batch normalization layer channel i output for each volume block is calculated from the following equation: yi ═ Gamma _ i × Xi _ avg + Beta _ i; wherein Yi is the output of the batch normalization layer channel i, Xi _ avg is the normalization processing value input by the channel i, and Gamma _ i and Beta _ i are batch normalization parameters of the channel i;
in a preferred embodiment, the step of training the deep neural network model according to the final loss function and saving the parameters of the trained model comprises:
training the network by using a public data set CIFAR10, returning a full convolution neural network in a back propagation mode according to a final loss function, and updating network parameters; the training mode adopts a random gradient descent method and a momentum method, the learning momentum parameter is set to be 0.9, the learning rate is polynomial slow descent, the training is stopped after the preset times of training, and the network model parameters are stored.
The second aspect of the present invention provides a deep neural network pruning device, including:
the network model building module is used for building a full convolution depth neural network model in a mode of stacking a plurality of convolution blocks; each convolution block comprises a convolution layer, a batch normalization layer and an activation layer;
the calculation intensity calculation module is used for calculating the calculation intensity of each volume block;
the sparse training module is used for designing a loss function, and carrying out sparse training on the deep neural network to obtain network model parameters; wherein the loss function is defined as: loss _ ce + Σ Weight [ i ] | gamma [ i ] | |, where Weight [ i ] ═ Alpha × sqrt (Intensity [ i ]/Intensity _ base), Weight [ i ] is a penalty coefficient of the ith volume block (i ═ 1,2,3,4, …, N is the number of volume blocks), Alpha is a penalty coefficient constant, Intensity [ i ] is the calculated strength of the ith volume block, Intensity _ base is the calculated strength of the last layer of volume blocks, Loss _ ce is a cross entropy Loss function, and gamma [ i ] is the gamma value of the ith volume block;
the network model pruning module is used for extracting the gamma value of each rolling block and carrying out pruning operation according to the following formula: mask [ i ] ═ gamma [ i ] | > 0.0001; wherein Mask ═ 1 denotes the retention channel; mask 0 represents a deletion channel;
and the model integration application module is used for re-integrating the deep neural network according to the Mask value to obtain a new parameter structure and applying the new parameter structure to the GPU calculation at the edge end.
In a preferred embodiment, the network model building module comprises:
a data set acquisition unit for setting a training data set;
the convolution layer configuration unit is used for configuring the convolution layer by adopting preset attribute parameters; the attribute parameters comprise a convolution window, convolution span, input channel number and output channel number;
a convolution block generation unit for combining the convolution layer, the batch normalization layer, and the activation layer into a convolution block;
and the network model building unit is used for stacking the convolution blocks to build a fully-convoluted deep neural network model.
A third aspect of the present invention provides a terminal, which includes a memory, a processor, and a deep neural network pruning program stored in the memory and operable on the processor, and when executed by the processor, the deep neural network pruning program implements the steps of the deep neural network pruning method according to any one of the above embodiments.
A fourth aspect of the present invention provides a computer-readable storage medium, which stores a deep neural network pruning program, and when the deep neural network pruning program is executed by a processor, the deep neural network pruning program implements the steps of the deep neural network pruning method according to any one of the above embodiments.
The deep neural network pruning method provided by the invention is characterized in that the calculation intensity of each convolutional layer is respectively calculated, a pruning penalty coefficient function is designed from the angle of the calculation intensity, the pruning penalty coefficient of each convolutional block layer is positively correlated with the calculation intensity of the convolutional block layer, the higher the calculation intensity is, the higher the penalty coefficient is, namely, a weighting pruning strategy is set according to the calculation intensity of each convolutional layer. By respectively optimizing the convolution layers at different positions, the calculation time consumption is obviously reduced, and the calculation speed of the edge end and the accuracy of model prediction are improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of a deep neural network pruning method provided by the present invention;
FIG. 2 is a flowchart illustrating the sub-steps of step S101 in the deep neural network pruning method shown in FIG. 1;
FIG. 3 is a schematic diagram of a deep neural network model in an embodiment of the present invention;
FIG. 4 is a flowchart illustrating the sub-steps of step S103 in the deep neural network pruning method shown in FIG. 1;
FIG. 5 is a frame diagram of a deep neural network pruning device;
fig. 6 is a frame diagram of a network model building module in the deep neural network pruning device shown in fig. 5.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantageous effects of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the detailed description. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
It should be noted that, the channel pruning technique (channel prune): that is, channels (channels) are directly pruned, some redundant channels are removed, which is equivalent to slimming the network structure, and the shapes of the convolutional layers are uniform.
Convolution: in the field of computer vision, convolution kernels and filters are usually small-sized matrixes, such as 3 × 3, 5 × 5 and the like, and digital images are 2-dimensional (multidimensional) matrixes (tensors) with relatively large sizes, and features (patterns) from simple to complex are learned layer by layer through convolutional neural networks.
In an embodiment of the present invention, a first aspect is to provide a deep neural network pruning method, which is applied to edge-side GPU (Graphics Processing Unit) computation, so that edge-side devices such as robots can process some Graphics Processing problems with high real-time requirements through mounted or integrated micro-GPU.
As shown in fig. 1, the deep neural network pruning method includes the following steps S101 to S105.
In step S101, a full-convolution deep neural network model is constructed in a manner of stacking a plurality of convolution blocks; wherein each convolution block comprises a convolution layer, a batch normalization layer and an activation layer.
In this step, a fully-convoluted deep neural network model is first constructed. Specifically, as shown in FIG. 2, step S101 includes steps S1011-S1014.
In step S1011, a training data set is set. In this example, a public data set CIFAR-10 is used, which is a labeled data set with 60000 color images, which are 32 × 32 and are divided into 10 classes of 6000 images each.
In step S1012, configuring the convolutional layer using preset attribute parameters; the attribute parameters comprise convolution windows, convolution spans, input channel numbers and output channel numbers.
Specifically, the convolutional Layer (convolutional Layer) is a basic unit formed by a visual deep neural network, and has attributes such as a Convolution window (Kernel, abbreviated as K), a Convolution span (Stride, abbreviated as S), a number of input/output channels (Cin, Cout), and the like, and is usually combined with a Batch Normalization Layer (Batch Normalization) and an activation Layer (Relu) to form a convolutional block. That is, each of the convolutional layers is configured according to preset attribute parameters, representing convolutional layers (Cin, Cout, K, S). As shown in fig. 3, the numbers in parentheses for each convolutional block correspond to the attribute parameters in the convolutional layer.
In step S1013, the convolutional layer, the batch normalization layer, and the activation layer are combined into a convolutional block. Further, in some channels in close positions, several convolution blocks are combined to form a convolution block group.
In step S1014, a plurality of convolution blocks are stacked, and a fully-convolved deep neural network model is constructed. As shown in fig. 3, this is one example of the definition of the present embodiment.
It should be noted that, the method of combining the convolution layer, the batch normalization layer and the activation layer into a convolution block and stacking a plurality of convolution blocks can refer to the prior art, and the invention is not limited thereto.
In step S102, the calculation strength of each volume block is calculated.
It should be noted that, when the deep neural network performs inference calculation, it mainly occupies the resources of the arithmetic logic unit and the memory bandwidth of the edge-end computing device. The number of arithmetic logic units of the computing equipment directly influences the efficiency of a large number of parallel convolution operations of the deep neural network, and meanwhile, the bandwidth of a memory also influences the parameter copying speed of the deep neural network. Therefore, when a deep neural network architecture is designed, the two aspects are considered, otherwise, the calculation amount is not large, and the actual operation speed does not reach the expected speed.
Wherein. And calculating the calculation intensity of the convolution layer according to a calculation formula to obtain the calculation intensity of the corresponding convolution block. Specifically, the calculation formula is as follows: calculating Intensity (Intensity, abbreviated as I), I ═ Flops/Mems ═ H × W ^ K ^2 ^ Cin/(4 ^ H ^ W + K ^2 ^ Cin)); wherein, Flops is the calculated amount of the convolution layer, and is calculated by a formula Flops ═ H ^ W ^ K ^2 ^ Cin ^ Cout; mems is the volume layer access amount, and can be calculated by a formula Mems ═ 4 × (H ^ W ^ Cout + K ^2 ^ Cin ^ Cout); wherein H, W is the feature size of the convolutional layer output, K is the convolutional layer convolutional window size, Cin is the convolutional layer input channel number, and Cout is the convolutional layer output channel number.
It should be noted that the computation Flops refers to the number of floating point operations that occur when a single sample (for example, one image) is input and the model performs a complete forward propagation, that is, the time complexity of the model; the access quantity Mems refers to the total amount of memory exchange occurring in the process of inputting a single sample and completing one forward propagation by the model, namely the space complexity of the model. The calculation strength of the convolutional layer can be obtained by dividing the calculation amount by the memory access amount, which indicates how many times of floating point operations are performed on each Byte memory swap in the calculation process of the convolutional layer model, and it can be seen that the greater the calculation strength is, the higher the memory use efficiency is.
In step S103, designing a loss function, and performing sparse training on the deep neural network to obtain network model parameters; wherein the loss function is defined as: loss _ ce + Σ Weight [ i ] | gamma [ i ] | |, where Weight [ i ] ═ Alpha × sqrt (Intensity [ i ]/Intensity _ base), Weight [ i ] is a penalty coefficient of the ith volume block (i ═ 1,2,3,4, …, N is the number of volume blocks), Alpha is a penalty coefficient constant, Intensity [ i ] is the calculated strength of the ith volume block, Intensity _ base is the calculated strength of the last layer of volume blocks, Loss _ ce is a cross entropy Loss function, and gamma [ i ] is the gamma value of the ith volume block.
Specifically, as shown in fig. 4, step S103 includes the following steps S1031 to S1034.
In step S1031, a sparse parameter group is acquired. The sparse parameter set is defined by a Batch Normalization Layer (Batch Normalization Layer); the batch normalization layer channel i output for each volume block is calculated from the following equation: yi ═ Gamma _ i × Xi _ avg + Beta _ i; wherein Yi is the output of the batch normalization layer channel i, Xi _ avg is the normalization processing value input by the channel i, and Gamma _ i and Beta _ i are batch normalization parameters of the channel i.
From the formula, it is easy to derive that, in the batch normalization layer, the contribution of the Gamma value of the parameter to the value of the channel is large, and specifically, the closer the Gamma value is to a zero value, the smaller the influence of the channel on the prediction result of the deep neural network is, so from this point, channel pruning (channel prune) is a way of suppressing the Gamma value of a part of channels in the deep neural network by adding L1 norm sparseness training to the parameter. Specifically, it can be expressed by the following formulas of steps S1032 and S1033.
In step S1032, a cross entropy loss function (cross entropy) is defined. Firstly, defining cross entropy CE (Y1, Y2) Y1 log (Y2) + (1-Y1) log (1-Y2), and then respectively substituting Y _ true and Y _ pred into Y1 and Y2 to obtain a cross entropy Loss function Loss _ CE (Y _ true, Y _ pred); wherein Y _ true is a real data category label, and Y _ pred is a category of deep neural network prediction;
in step S1033, a final loss function is constructed using the L1 loss function formula. Firstly, defining a L1 Loss function Loss _ L1 ═ Weight | | |, gamma |, wherein Weight is a gamma parameter penalty coefficient and is usually 0.001; the final Loss function is defined as Loss _ ce + Loss _ L1. And substituting the expression of the L1 Loss function into the final Loss function and performing weighting calculation, so that the final Loss function is transformed into Loss _ ce + sigma Weight [ i ] | | gamma [ i ] |. Wherein Weight [ i ] ═ alpha × sqrt (Intensity [ i ]/Intensity _ base), Weight [ i ] is a penalty coefficient of the ith convolution block (i ═ 1,2,3,4, …, N is the number of convolution blocks); alpha is a penalty coefficient constant, and is taken as 0.001; intensity [ i ] is the calculated Intensity of the ith volume block, Intensity _ base is the calculated Intensity of the last layer of volume block, Loss _ ce is the cross entropy Loss function, and gamma [ i ] is the gamma value of the ith volume block.
Therefore, in the final loss function, the gamma parameter penalty coefficient is no longer a uniform fixed coefficient, and is determined by the computation strength of each layer volume block.
In step S1034, the deep neural network model is trained according to the final loss function, and the network model parameters are saved.
In the step, a public data set CIFAR10 is adopted to train the network, and according to the final loss function, the full convolution neural network is returned in a back propagation mode to update the network parameters; the training mode adopts a random Gradient descent method SGD (stored Gradient Descent) and a momentum method, the learning momentum parameter is set to be 0.9, the learning rate is the slow decline of the polynomial, and the training is stopped and the network model parameter is saved after the preset times (for example, 100 times) of training.
In step S104, the gamma value in the batch normalization layer of each volume block is extracted, and pruning is performed according to the following formula: mask [ i ] ═ gamma [ i ] | > 0.0001; wherein Mask ═ 1 denotes the retention channel; mask 0 indicates the deletion channel. Mask is a Mask, also called Mask, which shields some areas on the image, so that it does not participate in processing or calculation of processing parameters, or only processes or counts the shielded area. Therefore, effective numerical value screening is carried out on the Gamma value of each convolution layer, the Gamma value of a part of channels in the deep neural network is inhibited, and the calculation efficiency is improved. Meanwhile, the accuracy of graph calculation is improved by screening out the Gamma value with a higher weighted value.
In step S105, the deep neural network is reintegrated according to the Mask value to obtain a new parameter structure, and the new parameter structure is applied to the edge-side GPU for calculation.
Table 1 shows the comparison of the parameters calculated by the original deep neural network, the deep neural network after the general pruning operation, and the deep neural network in the embodiment of the present invention at the edge GPU:
Figure BDA0002851689310000111
Figure BDA0002851689310000121
it is obvious from the table data that the calculation time consumption can be obviously reduced no matter which pruning strategy is adopted, but the general pruning strategy reduces a large number of parameters, but also brings a little loss of accuracy. The parameters of the pruning strategy provided by the invention are 3 times of those of the general pruning strategy, but the calculation speed of the edge end and the model prediction accuracy are obviously superior to those of the general pruning strategy.
In summary, in the deep neural network pruning method provided by the present invention, the computation strength of each convolutional layer is calculated, a pruning penalty coefficient function is designed from the perspective of the computation strength, the pruning penalty coefficient of each convolutional block layer is positively correlated with the computation strength of the convolutional block, the greater the computation strength of the convolutional block layer is, the greater the penalty coefficient is, that is, a weight and weighted pruning strategy is set according to the computation strength of each convolutional layer. By respectively optimizing the convolution layers at different positions, the calculation time consumption is obviously reduced, and the calculation speed of the edge end and the accuracy of model prediction are improved.
The second aspect of the present invention provides a deep neural network pruning device 100, which is applied to an edge-side GPU. It should be noted that the implementation principle and the implementation manner of the deep neural network pruning device 100 are consistent with the deep neural network pruning method, and therefore, the detailed description is omitted below.
As shown in fig. 5, the deep neural network pruning device 100 includes:
a network model construction module 10, configured to construct a full convolution depth neural network model in a manner of stacking a plurality of convolution blocks; each convolution block comprises a convolution layer, a batch normalization layer and an activation layer;
a calculation intensity calculation module 20, configured to calculate a calculation intensity of each volume block;
the sparse training module 30 is used for designing a loss function, and performing sparse training on the deep neural network to obtain network model parameters; wherein the loss function is defined as: loss _ ce + Σ Weight [ i ] | gamma [ i ] | |, where Weight [ i ] ═ Alpha × sqrt (Intensity [ i ]/Intensity _ base), Weight [ i ] is a penalty coefficient of the ith volume block (i ═ 1,2,3,4, …, N is the number of volume blocks), Alpha is a penalty coefficient constant, Intensity [ i ] is the calculated strength of the ith volume block, Intensity _ base is the calculated strength of the last layer of volume blocks, Loss _ ce is a cross entropy Loss function, and gamma [ i ] is the gamma value of the ith volume block;
the network model pruning module 40 is configured to extract a gamma value of each convolution block, and perform pruning according to the following formula: mask [ i ] ═ gamma [ i ] | > 0.0001; wherein Mask ═ 1 denotes the retention channel; mask 0 represents a deletion channel;
and the model integration application module 50 is used for re-integrating the deep neural network according to the Mask value to obtain a new parameter structure, and applying the new parameter structure to the edge-end GPU calculation.
Further, in one embodiment, as shown in fig. 6, the network model building module 10 includes:
a data set acquisition unit 11 for setting a training data set;
a convolutional layer configuration unit 12 configured to configure a convolutional layer using preset attribute parameters; the attribute parameters comprise a convolution window, convolution span, input channel number and output channel number;
a convolution block generation unit 13 for combining the convolution layer, the batch normalization layer, and the activation layer into a convolution block;
and the network model building unit 14 is used for stacking the convolution blocks to build a fully-convoluted deep neural network model.
In a further aspect, the present invention provides a terminal (not shown in the drawings), where the terminal includes a memory, a processor, and a deep neural network pruning program stored in the memory and operable on the processor, and when executed by the processor, the deep neural network pruning program implements the steps of the deep neural network pruning method according to any one of the foregoing embodiments.
The present invention further provides a computer readable storage medium (not shown in the drawings), which stores a deep neural network pruning program, and when the deep neural network pruning program is executed by a processor, the deep neural network pruning program implements the steps of the deep neural network pruning method according to any one of the above embodiments.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed system or apparatus/terminal device and method can be implemented in other ways. For example, the above-described system or apparatus/terminal device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The invention is not limited solely to that described in the specification and embodiments, and additional advantages and modifications will readily occur to those skilled in the art, so that the invention is not limited to the specific details, representative apparatus, and illustrative examples shown and described herein, without departing from the spirit and scope of the general concept as defined by the appended claims and their equivalents.

Claims (10)

1. A deep neural network pruning method is applied to GPU calculation at an edge end, and is characterized by comprising the following steps:
constructing a full convolution depth neural network model in a mode of stacking a plurality of convolution blocks; each convolution block comprises a convolution layer, a batch normalization layer and an activation layer;
calculating the calculation intensity of each volume block;
designing a loss function, and carrying out sparse training on the deep neural network to obtain network model parameters; wherein the loss function is defined as: loss _ ce + Σ Weight [ i ] | gamma [ i ] | |, where Weight [ i ] ═ Alpha × sqrt (Intensity [ i ]/Intensity _ base), Weight [ i ] is a penalty coefficient of the ith volume block (i ═ 1,2,3,4, …, N is the number of volume blocks), Alpha is a penalty coefficient constant, Intensity [ i ] is the calculated strength of the ith volume block, Intensity _ base is the calculated strength of the last layer of volume blocks, Loss _ ce is a cross entropy Loss function, and gamma [ i ] is the gamma value of the ith volume block;
extracting the gamma value of each rolling block, and pruning according to the following formula: mask [ i ] ═ gamma [ i ] | > 0.0001; wherein Mask ═ 1 denotes the retention channel; mask 0 represents a deletion channel;
and re-integrating the deep neural network according to the Mask value to obtain a new parameter structure, and applying the new parameter structure to the GPU calculation at the edge end.
2. The deep neural network pruning method of claim 1, wherein the building a full-convolution deep neural network by stacking a plurality of convolution blocks step comprises:
setting a training data set;
configuring the convolutional layer by adopting preset attribute parameters; the attribute parameters comprise a convolution window, convolution span, input channel number and output channel number;
combining the convolutional layer, the batch normalization layer, and the activation layer into a convolutional block;
and stacking a plurality of the volume blocks to construct a fully-convoluted deep neural network model.
3. The deep neural network pruning method of claim 1, wherein the calculating the computation strength of each convolution block step comprises:
calculating the calculation intensity of the convolution layer according to a calculation formula to obtain the calculation intensity of the corresponding convolution block; the calculation formula is as follows: i ═ Flops/Mems ═ H ^ W ^ K ^2 ^ Cin/(4 ^ W + K ^2 ^ Cin)); wherein, Flops is the calculated amount of the convolution layer, and is calculated by a formula Flops ═ H ^ W ^ K ^2 ^ Cin ^ Cout; mems is the volume layer access amount, and can be calculated by a formula Mems ═ 4 × (H ^ W ^ Cout + K ^2 ^ Cin ^ Cout); wherein H, W is the feature size of the convolutional layer output, K is the convolutional layer convolutional window size, Cin is the convolutional layer input channel number, and Cout is the convolutional layer output channel number.
4. The deep neural network pruning method according to claim 1, wherein the step of designing the loss function, performing sparse training on the deep neural network, and obtaining network model parameters comprises:
acquiring a sparse parameter set;
defining a cross entropy loss function; firstly, defining cross entropy CE (Y1, Y2) Y1 log (Y2) + (1-Y1) log (1-Y2), and then respectively substituting Y _ true and Y _ pred into Y1 and Y2 to obtain the cross entropy Loss function Loss _ CE (Y _ true, Y _ pred); wherein Y _ true is a real data category label, and Y _ pred is a category of deep neural network prediction;
constructing a final loss function by using an L1 loss function formula; firstly, defining a Loss function Loss _ L1 ═ Weight | | | |, gamma | |, of L1, wherein Weight is a gamma parameter penalty coefficient; the final Loss function is defined as Loss _ ce + Loss _ L1;
and training the deep neural network model according to the final loss function, and storing network model parameters.
5. The deep neural network pruning method of claim 4, wherein the set of sparse parameters is defined by a batch normalization layer; the batch normalization layer channel i output for each volume block is calculated from the following equation: yi ═ Gamma _ i × Xi _ avg + Beta _ i; wherein Yi is the output of the batch normalization layer channel i, Xi _ avg is the normalization processing value input by the channel i, and Gamma _ i and Beta _ i are batch normalization parameters of the channel i.
6. The deep neural network pruning method of claim 4, wherein the step of training the deep neural network model according to the final loss function and saving the parameters of the trained model comprises:
training the network by using a public data set CIFAR10, returning a full convolution neural network in a back propagation mode according to a final loss function, and updating network parameters; the training mode adopts a random gradient descent method and a momentum method, the learning momentum parameter is set to be 0.9, the learning rate is polynomial slow descent, the training is stopped after the preset times of training, and the network model parameters are stored.
7. A deep neural network pruning device, comprising:
the network model building module is used for building a full convolution depth neural network model in a mode of stacking a plurality of convolution blocks; each convolution block comprises a convolution layer, a batch normalization layer and an activation layer;
the calculation intensity calculation module is used for calculating the calculation intensity of each volume block;
the sparse training module is used for designing a loss function, and carrying out sparse training on the deep neural network to obtain network model parameters; wherein the loss function is defined as: loss _ ce + Σ Weight [ i ] | gamma [ i ] | |, where Weight [ i ] ═ Alpha × sqrt (Intensity [ i ]/Intensity _ base), Weight [ i ] is a penalty coefficient of the ith volume block (i ═ 1,2,3,4, …, N is the number of volume blocks), Alpha is a penalty coefficient constant, Intensity [ i ] is the calculated strength of the ith volume block, Intensity _ base is the calculated strength of the last layer of volume blocks, Loss _ ce is a cross entropy Loss function, and gamma [ i ] is the gamma value of the ith volume block;
the network model pruning module is used for extracting the gamma value of each rolling block and carrying out pruning operation according to the following formula: mask [ i ] ═ gamma [ i ] | > 0.0001; wherein Mask ═ 1 denotes the retention channel; mask 0 represents a deletion channel;
and the model integration application module is used for re-integrating the deep neural network according to the Mask value to obtain a new parameter structure and applying the new parameter structure to the GPU calculation at the edge end.
8. The deep neural network pruning device of claim 7, wherein the network model building module comprises:
a data set acquisition unit for setting a training data set;
the convolution layer configuration unit is used for configuring the convolution layer by adopting preset attribute parameters; the attribute parameters comprise a convolution window, convolution span, input channel number and output channel number;
a convolution block generation unit for combining the convolution layer, the batch normalization layer, and the activation layer into a convolution block;
and the network model building unit is used for stacking the convolution blocks to build a fully-convoluted deep neural network model.
9. A terminal, characterized in that the terminal comprises a memory, a processor and a deep neural network pruning program stored in the memory and executable on the processor, the deep neural network pruning program, when executed by the processor, implementing the steps of the deep neural network pruning method according to any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a deep neural network pruning program, which when executed by a processor, implements the steps of the deep neural network pruning method according to any one of claims 1 to 6.
CN202011529589.6A 2020-12-22 2020-12-22 Deep neural network pruning method, device, terminal and storage medium Active CN112529165B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011529589.6A CN112529165B (en) 2020-12-22 2020-12-22 Deep neural network pruning method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011529589.6A CN112529165B (en) 2020-12-22 2020-12-22 Deep neural network pruning method, device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN112529165A true CN112529165A (en) 2021-03-19
CN112529165B CN112529165B (en) 2024-02-02

Family

ID=75002419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011529589.6A Active CN112529165B (en) 2020-12-22 2020-12-22 Deep neural network pruning method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN112529165B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657896A (en) * 2021-08-20 2021-11-16 成都链安科技有限公司 Block chain transaction topological graph analysis method and device based on graph neural network
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
WO2023035221A1 (en) * 2021-09-10 2023-03-16 Intel Corporation Sample-adaptive cross-layer norm calibration and relay neural network
WO2023197460A1 (en) * 2022-04-15 2023-10-19 浪潮(北京)电子信息产业有限公司 Image recognition method and apparatus, electronic device, and storage medium
CN117173446A (en) * 2023-06-26 2023-12-05 北京百度网讯科技有限公司 Image classification and training method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107925A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
CN110147834A (en) * 2019-05-10 2019-08-20 上海理工大学 Fine granularity image classification method based on rarefaction bilinearity convolutional neural networks
CN110874631A (en) * 2020-01-20 2020-03-10 浙江大学 Convolutional neural network pruning method based on feature map sparsification
CN111199282A (en) * 2019-12-31 2020-05-26 的卢技术有限公司 Pruning method and device for convolutional neural network model
CN111461322A (en) * 2020-03-13 2020-07-28 中国科学院计算技术研究所 Deep neural network model compression method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107925A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
CN110147834A (en) * 2019-05-10 2019-08-20 上海理工大学 Fine granularity image classification method based on rarefaction bilinearity convolutional neural networks
CN111199282A (en) * 2019-12-31 2020-05-26 的卢技术有限公司 Pruning method and device for convolutional neural network model
CN110874631A (en) * 2020-01-20 2020-03-10 浙江大学 Convolutional neural network pruning method based on feature map sparsification
CN111461322A (en) * 2020-03-13 2020-07-28 中国科学院计算技术研究所 Deep neural network model compression method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657896A (en) * 2021-08-20 2021-11-16 成都链安科技有限公司 Block chain transaction topological graph analysis method and device based on graph neural network
WO2023035221A1 (en) * 2021-09-10 2023-03-16 Intel Corporation Sample-adaptive cross-layer norm calibration and relay neural network
WO2023197460A1 (en) * 2022-04-15 2023-10-19 浪潮(北京)电子信息产业有限公司 Image recognition method and apparatus, electronic device, and storage medium
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
CN117173446A (en) * 2023-06-26 2023-12-05 北京百度网讯科技有限公司 Image classification and training method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112529165B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN112529165A (en) Deep neural network pruning method, device, terminal and storage medium
Guo et al. Software-hardware codesign for efficient neural network acceleration
CN108108809B (en) Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof
CN107844828B (en) Convolution calculation method in neural network and electronic device
CN108280514B (en) FPGA-based sparse neural network acceleration system and design method
CN108108811B (en) Convolution calculation method in neural network and electronic device
US8131659B2 (en) Field-programmable gate array based accelerator system
EP3627397A1 (en) Processing method and apparatus
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
CN112163601B (en) Image classification method, system, computer device and storage medium
WO2022067508A1 (en) Neural network accelerator, and acceleration method and device
CN112633490B (en) Data processing device, method and related product for executing neural network model
CN107944545A (en) Computational methods and computing device applied to neutral net
Shahshahani et al. Memory optimization techniques for fpga based cnn implementations
CN113554084A (en) Vehicle re-identification model compression method and system based on pruning and light-weight convolution
CN116075821A (en) Form convolution and acceleration
CN111882053B (en) Neural network model compression method based on splicing convolution
CN113850365A (en) Method, device, equipment and storage medium for compressing and transplanting convolutional neural network
Zhou et al. Sagitta: An energy-efficient sparse 3D-CNN accelerator for real-time 3D understanding
CN115130672B (en) Software and hardware collaborative optimization convolutional neural network calculation method and device
CN111914867A (en) Convolutional neural network IP core design based on FPGA
WO2022227024A1 (en) Operational method and apparatus for neural network model and training method and apparatus for neural network model
WO2021120036A1 (en) Data processing apparatus and data processing method
EP4170547A1 (en) Method for extracting data features, and related apparatus
WO2022266888A1 (en) Congestion prediction model training method, image processing method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant