CN114742221A

CN114742221A - Deep neural network model pruning method, system, equipment and medium

Info

Publication number: CN114742221A
Application number: CN202210314690.2A
Authority: CN
Inventors: 马钟; 樊一哲; 毛远宏
Original assignee: Xian Microelectronics Technology Institute
Current assignee: Xian Microelectronics Technology Institute
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-07-12

Abstract

The invention discloses a method, a system, equipment and a medium for pruning a deep neural network model, wherein the method comprises the following steps: carrying out sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; based on the importance evaluation result of the weight absolute value of each channel in point-wise convolution, pruning the convolution layer channel of the sparse model to obtain a pruned network model; carrying out fine tuning training on the weight of the pruned network model, and outputting the fine tuned network model to obtain the pruned deep neural network model; the invention realizes the sparseness of point-wise convolution weight, ensures the network precision and can effectively reduce the calculated amount and parameter amount of the model.

Description

Deep neural network model pruning method, system, equipment and medium

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a deep neural network model pruning method, system, equipment and medium.

Background

At present, a series of breakthrough progresses are made on an artificial intelligence technology taking a convolutional neural network as a core technology, the artificial intelligence technology is gradually applied to weaponry and various spacecrafts, and the applications of satellite-based on-orbit target detection, accurate hitting of intelligent target identification of missiles, autonomous obstacle avoidance, mission planning and the like are realized; however, the deep neural network model depending on the application realization usually has more model parameters and large calculation amount; in most practical application scenarios, the calculation unit of the neural network model in the AI application embedded device is often limited in size and power consumption, so that the model cannot be normally deployed due to excessive parameters, or the forward reasoning consumes a long time after deployment, and the real-time requirement of intelligent application cannot be met; therefore, how to cut the deep neural network by using the neural network pruning technology to reduce the parameter number of the network model and the calculation amount of the forward inference becomes a research hotspot of the machine learning at present.

The calculation amount of the neural network model can be greatly reduced by adopting the deep separable convolution to replace the traditional convolution, so that the method is widely applied to an embedded low-power-consumption platform; most of the existing model pruning methods aim at the traditional convolution, the existing pruning method for deep separable convolution at home and abroad has no mature scheme, and if the model pruning can be carried out on the deep separable convolution, the calculated amount can be further reduced on the basis of being lower than that of the traditional convolution, so that the method has important practical value.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a deep neural network model pruning method, a system, equipment and a medium, which are used for solving the technical problems that most of the existing model pruning methods aim at the traditional convolution and no deep network model pruning method aiming at the depth separable convolution exists.

In order to achieve the purpose, the invention adopts the technical scheme that:

the invention provides a deep neural network model pruning method, which comprises the following steps:

carrying out sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution;

based on the importance evaluation result of the weight absolute value of each channel in point-wise convolution, pruning the convolution layer channel of the sparse model to obtain a pruned network model;

and carrying out fine tuning training on the weight of the trimmed network model, and outputting the fine tuned network model to obtain the trimmed deep neural network model.

Further, a process of performing sparse training on the model to be pruned to obtain a sparse model includes the following steps:

introducing L into a loss function of a model to be pruned₁A regular term to obtain a new loss function;

and creating a training data set, and performing optimization training on the model to be pruned by using the training data set until a new loss function is converged to obtain the sparse model.

Further, the new loss function is:

J(θ；X,y)＝L_emp(θ；X,y)+λΩ(θ)

wherein J (θ; X, y) is a new loss function; l is_emp(theta; X, y) is the original loss function of the model to be pruned; lambda is an optimal penalty factor; omega (theta) is L₁A regularization term; theta is a model parameter needing to be learned; x is a training data set; y is a label; i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; omega_iIs the overall parameters of the ith convolution layer in the model to be pruned.

Further, based on the importance evaluation result of the weight absolute value of each channel in point-wise convolution, pruning the convolution layer channel of the sparse model to obtain a pruned network model, specifically as follows:

determining the weight absolute value of a point-wise convolution channel in the sparse model, and sequencing the point-wise convolution channel in the sparse model according to the sequence of the weight absolute values from large to small to obtain the importance sequencing of the point-wise convolution channel;

and pruning the point-wise convolution channel in the sparse model according to the importance sequence of the point-wise convolution channel and a preset channel pruning threshold value to obtain the pruned network model.

Further, the preset channel pruning threshold is 5% of the number of point-wise convolution channels in the sparsification module before pruning.

Further, after the trimming training is carried out on the weight of the trimmed network model, the method also comprises a cyclic trimming-trimming step;

the cyclic trimming-fine tuning step specifically comprises the following steps:

determining importance sequencing of the remaining point-wise convolution channels in the network model after fine tuning training; according to the importance ranking of the residual point-wise convolution channels and a preset channel trimming threshold, trimming the point-wise convolution channels in the network model after fine tuning training, performing fine tuning training on the weight of the trimmed model, and outputting the network model after fine tuning;

the cycle end conditions of the cycle trimming-trimming step are as follows: the pruning proportion of the network model after fine tuning reaches a preset pruning proportion; the preset pruning proportion is determined according to the hyper-parameter alpha.

Further, the process of performing fine tuning training on the weight of the pruned network model specifically includes:

training the trimmed network model and model weight by adopting an initial learning rate; wherein the initial learning rate is 0.1.

The invention also provides a deep neural network model pruning system, which comprises the following components:

the sparse training module is used for performing sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution;

the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model;

and the fine tuning module is used for performing fine tuning training on the weight of the trimmed network model and outputting the fine tuned network model to obtain the trimmed deep neural network model.

The invention also provides a deep neural network model pruning device, which comprises:

a memory for storing a computer program;

and the processor is used for realizing the steps of the deep neural network model pruning method when the computer program is executed.

The present invention further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and wherein the computer program, when executed by a processor, implements the steps of the deep neural network model pruning method.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a deep neural network model pruning method, wherein a deep separable convolution consists of depth-wise convolution and point-wise convolution; and aiming at the characteristic of deep separable convolution, carrying out channel-based pruning on point-wise convolution with large calculation amount. By evaluating the importance of the channels, unimportant channels are deleted, and thus the associated filters and feature maps are deleted. In order to ensure that the important characteristic diagram is reserved and the unimportant characteristic diagram is deleted, the sparsification of point-wise convolution weight is realized in the pruning process, the network precision is ensured, and meanwhile, the calculated amount and the parameter amount of the model can be effectively reduced.

Drawings

FIG. 1 is a flow chart of a deep neural network model pruning method according to the present invention;

FIG. 2 is a schematic diagram illustrating a process of pruning an input feature map according to the present invention;

fig. 3 is a flowchart of a pruning method for the MobileNetv2 model in the embodiment.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects of the present invention more apparent, the following embodiments further describe the present invention in detail. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, the invention provides a deep neural network model pruning method, which comprises the following steps:

step 1, performing sparse training on a model to be pruned to obtain a sparse model. Wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution includes a depth-wise convolution and a point-wise convolution.

The sparse training process specifically comprises the following steps:

step 11, introducing L into a loss function of the model to be pruned₁A new loss function is obtained by the regular term; wherein the new loss function is:

J(θ；X,y)＝L_emp(θ；X,y)+λΩ(θ)

wherein J (θ; X, y) is a new loss function; l is_emp(theta; X, y) is the original loss function of the model to be pruned; lambda is an optimal penalty factor; omega (theta) is L₁A regularization term; theta is a model parameter needing to be learned; x is a training data set; y is a label; i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; omega_iThe parameters of the ith convolution layer in the model to be pruned are all parameters; n is the number of training samples, and M is the number of training sample categories; p is a radical of_jmA predicted probability for an observation sample j belonging to category m; y is_jmIs a symbolic function, y_jmIs 0 or 1; wherein y is the true class of observation sample j is equal to m_jmGet 1, otherwise, y_jmTake 0.

And step 12, creating a training data set, and performing optimization training on the model to be pruned by using the training data set until a new loss function is converged to obtain the sparse model.

2, pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; wherein, the pruning process is specifically as follows:

step 21, determining a weight absolute value of a point-wise convolution channel in the sparse model, and sequencing the point-wise convolution channel in the sparse model according to a descending order of the weight absolute value to obtain an importance sequencing of the point-wise convolution channel;

step 22, pruning the point-wise convolution channel in the sparse model according to the importance ranking of the point-wise convolution channel and a preset channel pruning threshold value to obtain a pruned network model; in the invention, in the single pruning process, the preset channel pruning threshold is 5% of the number of point-wise convolution channels in the sparsification module before pruning.

Step 3, carrying out fine tuning training on the weight of the trimmed network model to obtain a fine tuning trained network model; the method specifically comprises the following steps: and training the trimmed network model and the model weight by adopting the initial learning rate to obtain the network model after fine tuning training.

Step 4, repeating the operations of the step 2 and the step 3 on the network model after the fine tuning training until the pruning proportion of the network model after the fine tuning reaches a preset pruning proportion; the preset pruning proportion is determined according to the hyper-parameter alpha.

The specific process is as follows:

determining importance sequencing of the remaining point-wise convolution channels in the network model after fine tuning training; according to the importance ranking of the residual point-wise convolution channels and a preset channel trimming threshold, trimming the point-wise convolution channels in the network model after fine tuning training, performing fine tuning training on the weight of the trimmed model, and outputting the network model after fine tuning; the cycle end conditions of the cycle trimming-trimming step are as follows: the pruning proportion of the network model after fine tuning reaches a preset pruning proportion; the preset pruning proportion is determined according to the hyper-parameter alpha.

And 5, evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification, and determining the validity of the pruning result of the network model after fine tuning.

The pruning principle is as follows:

according to the deep neural network model pruning method, in the pruning process, according to the characteristics of the deep separable convolution, the principle that the point-wise convolution layer channel with the larger sum of the absolute values of the convolution weights is combined, and the subsequent activation value generated by linear combination is stronger, so that the importance of the channel is stronger is adopted; high-dimensional information hidden in a depth separable convolution model channel is fully utilized, and pruning operation is performed more pertinently; in the pruning operation process, the accuracy of the network is recovered through multi-round training; the pruning method has no strong binding relation with the specific model structure of the model to be pruned, and can carry out pruning compression processing aiming at any convolutional neural network with application depth separable convolutional blocks.

The depth separable convolution comprises a depth-wise convolution and a point-wise convolution, wherein the point-wise convolution occupies most of the calculation; point-wise convolution is considered as a linear combination of different channels in the input feature map; in linear combination, the importance degree of an input channel can be estimated by evaluating the weight in the point-wise layer, so that smaller weights and associated characteristic diagrams are deleted, and the parameter quantity and the operand of the network are reduced.

In the deep separable convolution, most of the computation is still concentrated on the 1 × 1 convolution. Therefore, the emphasis of pruning is first placed on the 1 × 1 convolution; as shown in the following equation 1, the input M characteristic diagrams (F) are described in equation (1)₁，F₂，…，F_m) And a 1X 1 XM filter (k)₁，k₂，…，k_m) The convolution process of (2);

wherein, F_iThe ith value in the input feature map of the point-wise convolution operation is 1 multiplied by m in the dimension of the set of feature maps; k is a radical of_iThe ith weight coefficient in the convolution kernel of point-wise convolution operation, the weight dimension of the set of convolution kernels is also 1 × 1 × m;

is a final output characteristic diagram; since the planar scale of the point-by-point filter is 1 × 1, the output of the point-by-point convolution can be considered as a polynomial linear combination of the input feature map.

Since the point-by-point convolution computation can be expressed as a linear combination of feature maps, the contribution of feature maps with small weights in the linear combination should be minimal for the result; if the weight corresponding to one of the 1 × 1 × M convolutions is small, it can be considered that the importance of the channel is not strong. In the pruning process, the unimportant feature maps are considered to be deleted together with the weights thereof.

As shown in fig. 2, the weight of adding a light-colored input feature map is small; in the pruning process, the light color characteristic diagram channels and the weights of the corresponding convolution kernels are deleted, so that the calculated amount and the corresponding parameter number are reduced; in the point-by-point convolution cutting process, pruning is respectively carried out according to point-by-point convolution of each layer; with L₁The regularization is used as an evaluation standard of channel importance, and the channel importance of the 1 multiplied by 1 filter model is sequenced; in a single pruning process, sequentially removing the number of channels accounting for about 5 percent of the original number of channels of each depth classifiable convolutional layer according to an ascending order of importance, and then carrying out fine tuning training on the network weight; the hyper-parameter alpha is used for defining the total pruning proportion, and the process of pruning-fine adjustment is repeated for a plurality of times until the set value of the hyper-parameter alpha is reached; the super-parameter alpha obtains an empirical value through experiments, and calculated amount and parameters are reduced as far as possible on the premise of ensuring precision.

If the weights in the 1 × 1 convolution are all equally important during pruning, there is a relatively large impact on network performance after pruning. We want to keep the weight of unimportant channels as small as possible, close to zero, in linear combination, and the truly valuable channels and their weights. To achieve this, regularization methods are usually used, i.e. a regularization function is added to the objective function as a penalty function to limit the complexity of the model, and the generalization capability is improved by preventing overfitting.

The invention uses L in the training process₁The regularization method sparsizes the weight, selects part of the weight to make the value of the weight prominent, and makes the values of other weights close to zero; wherein L is added₁The overall loss function (also called objective function) after regularizing the penalty function is:

J(θ；X,y)＝L_emp(θ；X,y)+λΩ(θ)

wherein J (θ; X, y) is a new loss function; l is_emp(theta; X, y) is the original loss function of the model to be pruned; lambda is an optimal penalty factor; omega (theta) is L₁A regularization term; theta is a model parameter needing to be learned; x is a training data set; y is a label; i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; n is the number of training samples, and M is the number of training sample categories; p is a radical of_jmA predicted probability of belonging to class m for observation sample j; y is_jmIs a symbolic function, y_jmIs 0 or 1; wherein y is the true class of observation sample j is equal to m_jmGet 1, otherwise, y_jmTake 0.

λ is a hyper-parameter used to adjust the parameter norm penalty term, also called the optimal penalty factor; when λ is 0, it indicates no regularization, the larger λ corresponds to the larger regularization penalty; when model training adopts L₁When regularization is used for sparse, the empirical value of the λ parameter is generally 10^-5Left and right; for theL of the parameter₁The regularization is as follows:

wherein i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; omega_iIs the global parameter of the ith convolution layer in the model to be pruned.

The deep neural network model pruning method comprises the following steps: step A: acquiring a target image and a corresponding model original structure according to a specific task, and making a training and testing sample of the image into an input format of a model to be trained; and B: model pre-training module carries out tape L on model₁Performing sparse training on the regularization item to obtain an original sparse model, namely a baseline model, and measuring baseline accuracy on a data set test sample; and C: adopting a model pruning module to delete 5% of the number of the base line channels of each point-wise layer by adopting an importance evaluation method based on the point-wise weight absolute value; step D: and D, performing fine tuning training on the pruned model output in the step C and the attached weight thereof by using a fine tuning module with a smaller initial learning rate. After the fine tuning training is finished, jumping back to the step C, wherein the loop termination condition of the step C-D, namely pruning-fine tuning, is influenced by a hyper-parameter alpha; step E: after the steps are completed, the model pre-training module is used for verifying the final model in the aspects of precision, calculated quantity/parameter quantity and the like so as to determine the effectiveness of the pruning work.

The invention also provides a deep neural network model pruning system which comprises a sparsification training module, a pruning module, a fine adjustment module, a cyclic pruning-fine adjustment module and an evaluation verification module; the sparse training module is used for performing sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; the fine tuning module is used for carrying out fine tuning training on the weight of the trimmed network model to obtain a fine tuning trained network model; the method specifically comprises the following steps: training the trimmed network model and the model weight by adopting an initial learning rate to obtain a network model after fine tuning training; the circulating pruning-fine tuning module is used for repeating the pruning operation in the pruning module and the fine tuning operation in the fine tuning module on the network model after the fine tuning training until the pruning proportion of the network model after the fine tuning reaches the preset pruning proportion; and the evaluation and verification module is used for evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification and determining the validity of the pruning result of the network model after fine tuning.

In the invention, a sparse training module: the method comprises the following steps of obtaining an original sparse network model by adopting a software and hardware platform comprising a general training frame and adopting sparse training, and having the functions of calculating quantity/parameter quantity statistics and testing set precision statistics; a model pruning module: carrying out model pruning according to a designed importance evaluation method based on a point-wise weight absolute value; a fine adjustment module: similar to the model pre-training module, adjustments are made in learning rate and network training structure. And the training device is used for training the obtained simplified network structure for a certain turn after the model pruning.

The invention also provides a deep neural network model pruning device, which comprises: a memory for storing a computer program; and the processor is used for realizing the steps of the deep neural network model pruning method when the computer program is executed.

When the processor executes the computer program, the steps of the deep neural network model pruning method are implemented, for example: carrying out sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; the fine tuning module is used for carrying out fine tuning training on the weight of the trimmed network model to obtain a fine tuning trained network model; the method specifically comprises the following steps: training the trimmed network model and the model weight by adopting an initial learning rate to obtain a network model after fine tuning training; the circulating pruning-fine tuning module is used for repeating pruning operation and fine tuning operation on the network model after fine tuning training until the pruning proportion of the network model after fine tuning reaches a preset pruning proportion; and the evaluation and verification module is used for evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification and determining the validity of the pruning result of the network model after fine tuning.

Alternatively, the processor implements the functions of the modules in the system when executing the computer program, for example: the sparse training module is used for performing sparse training on the pruning model to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; the fine tuning module is used for carrying out fine tuning training on the weight of the trimmed network model to obtain a fine tuning trained network model; the method specifically comprises the following steps: training the trimmed network model and the model weight by adopting an initial learning rate to obtain a network model after fine tuning training; the circulating pruning-fine tuning module is used for repeating the pruning operation in the pruning module and the fine tuning operation in the fine tuning module on the network model after the fine tuning training until the pruning proportion of the network model after the fine tuning reaches the preset pruning proportion; and the evaluation and verification module is used for evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification and determining the validity of the pruning result of the network model after fine tuning.

Illustratively, the computer program may be partitioned into one or more modules/units, stored in the memory and executed by the processor, to implement the invention. The one or more modules/units may be a series of instruction segments of a computer program capable of performing preset functions, and the instruction segments are used for describing the execution process of the computer program in the deep neural network model pruning device. For example, the computer program may be partitioned into a sparsification training module, a pruning module, a trimming module, a cyclic pruning-trimming module, and an evaluation validation module; the specific functions of each module are as follows: the training device comprises a thinning training module, a training module and a training module, wherein the thinning training module is used for performing thinning training on a model to be pruned to obtain a thinning model; the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; the fine tuning module is used for carrying out fine tuning training on the weight of the trimmed network model to obtain a fine-tuned trained network model; the method specifically comprises the following steps: training the trimmed network model and the model weight by adopting an initial learning rate to obtain a network model after fine tuning training; the circulating pruning-fine tuning module is used for repeating the pruning operation in the pruning module and the fine tuning operation in the fine tuning module on the network model after the fine tuning training until the pruning proportion of the network model after the fine tuning reaches the preset pruning proportion; and the evaluation and verification module is used for evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification and determining the validity of the pruning result of the network model after fine tuning.

The deep neural network model pruning device can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The deep neural network model pruning device can comprise a processor and a memory, but is not limited to the processor and the memory. It will be understood by those skilled in the art that the above is only an example of the deep neural network model pruning device, and does not constitute a limitation to the deep neural network model pruning device, and may include more components than the above, or combine some components, or different components, for example, the deep neural network model pruning device may further include an input-output device, a network access device, a bus, etc.

The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor is a control center of the deep neural network model pruning device, and various interfaces and lines are used for connecting various parts of the whole deep neural network model pruning device.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the deep neural network model pruning device by running or executing the computer programs and/or modules stored in the memory and calling data stored in the memory.

The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash memory card (FlashCard), at least one disk storage device, a flash memory device, or other volatile solid state storage device.

The invention also provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of a deep neural network model pruning method.

The modules/units integrated by the deep neural network model pruning system can be stored in a computer readable storage medium if the modules/units are implemented in the form of software functional units and sold or used as independent products.

Based on such understanding, all or part of the processes in the deep neural network model pruning method can be realized by the present invention, and can also be completed by instructing relevant hardware through a computer program, where the computer program can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the deep neural network model pruning method can be realized. Wherein the computer program comprises computer program code, which may be in source code form, object code form, executable file or preset intermediate form, etc.

The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc.

It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

Examples

In this embodiment, the deep neural network model pruning method is specifically described by taking a pruning process of MobileNetv2 as an example.

As shown in fig. 3, the present embodiment provides a deep neural network model pruning method, including the following steps:

step 1, sparse training is carried out on the model MobileNetv2

According to a training framework packaged by a PyTorch deep neural network library, a target detection model MobileNetv2 is subjected to sparse training on an Nvidia RTX 2070GPU of an 8G video memory in an end-to-end mode.

The optimal penalty factor lambda of the regularization loss function of sparse training is 1 e-5; and the weight attenuation is set to be 5e-4 and the momentum is 0.9 by using a random gradient method (SGD) as an optimizer in the back propagation process.

In the training starting stage, random weight is adopted to carry out weight initialization on the baseline model; the input images are unified into square images with the side length of 32 pixels, and the number of batch processing images is set to be 128; wherein, 200 epochs are trained in total in the pre-training stage, the learning rate of the first 60 epochs is 0.1, the learning rate of the 61 st to 120 th epochs is 0.02, the learning rate of the 121 st to 160 th epochs is 4e-3, and the learning rate of the 161 st to 200 th epochs is 8 e-4.

Step 2, aiming at channel pruning of point-wise convolution

In this embodiment, the hyper-parameter α of the pruning ratio is set to 0.6; sorting weights corresponding to all channels in the plurality of 1 × 1 × M convolutional layers according to absolute values through a point-wise convolution calculation mode presented by a formula (1); in the pruning process, deleting the characteristic graph corresponding to the channel with the minimum absolute value and the weight thereof; channel deletion is performed several times until the layer is stopped when channels corresponding to around 5% of the number of channels in the baseline model are deleted.

the characteristic diagram is finally output.

The channel selection rule to be deleted is as follows:

min{||k_i||₁},1≤i≤m。

step 3, fine tuning process

After the channel pruning aiming at the point-wise convolution is finished in the step 2, modifying the upper and lower connected network layers of each point-wise convolution layer; specifically, the upper and lower connection network layers include, but are not limited to, a batch normalization layer, an activation layer and a depth-wise convolution layer which are immediately adjacent to each other; so that the number of the channels of the feature graph and the weight in the transmission process is consistent, and the feature graph can be correctly reasoned; after all the neural network structures are trimmed; fine tuning was performed according to the same training frame and hyper-parameter settings described above for a total of 200 epochs.

And (3) circularly executing the processes in the steps 2 and 3 until the set pruning proportion with the hyper-parameter alpha of 0.6 is obtained, and circulating for 12 times in total.

Step 4, precision test and scale statistics of pruning model

The algorithm obtained by the method is verified on a CIFAR-100 data set, and the index use parameter compression rate, the calculated amount compression rate and the average precision are evaluated.

And (4) verification result:

for the CIFAR-100 dataset, the MobileNet V2 network is simplified by using the embodiment; the parameter quantity is reduced from 2.36938M of the original network to 0.46082M, and the compression rate is 19.45%; the calculated amount is reduced from 0.06775925G FLOPS of the original network to 0.013291428G FLOPS, and the compression rate is 19.62 percent; the network TOP-1 precision is changed from 68.42% to 67.41%, and the TOP-5 precision is changed from 90.97% to 90.54%; in other words, under the condition that the precision level of the network model is kept, the parameters and the calculated amount of the deep neural network model pruning algorithm based on the channel importance are reduced to be less than one fifth of the former parameters and calculated amount; the performance of this example based on the MobileNetv2 baseline model structure and the CIFAR-100 dataset.

The invention relates to a deep neural network model pruning method, a system, equipment and a medium, wherein a deep separable convolution consists of depth-wise convolution and point-wise convolution; and aiming at the characteristic of deep separable convolution, carrying out channel-based pruning on point-wise convolution with large calculation amount. By evaluating the importance of the channels, unimportant channels are deleted, and thus the associated filters and feature maps are deleted. In order to ensure that the important characteristic diagram is reserved and the unimportant characteristic diagram is deleted, the sparsification of point-wise convolution weight is realized in the pruning process, the network precision is ensured, and meanwhile, the calculated amount and the parameter amount of the model can be effectively reduced.

The above-described embodiment is only one of the embodiments that can implement the technical solution of the present invention, and the scope of the present invention is not limited by the embodiment, but includes any variations, substitutions and other embodiments that can be easily conceived by those skilled in the art within the technical scope of the present invention disclosed.

Claims

1. A deep neural network model pruning method is characterized by comprising the following steps:

2. The deep neural network model pruning method according to claim 1, wherein the process of performing sparse training on the model to be pruned to obtain a sparse model is as follows:

3. The deep neural network model pruning method of claim 2, wherein the new loss function is:

J(θ；X,y)＝L_emp(θ；X,y)+λΩ(θ)

4. The deep neural network model pruning method according to claim 1, characterized in that based on an importance evaluation result of a weight absolute value of each channel in point-wise convolution, a process of pruning a convolution layer channel of a sparse model to obtain a pruned network model is specifically as follows:

5. The deep neural network model pruning method according to claim 4, wherein the preset channel pruning threshold is 5% of the number of point-wise convolution channels in the pre-pruning sparsization module.

6. The deep neural network model pruning method according to claim 1, characterized by further comprising a cyclic pruning-fine tuning step after fine tuning training of the weights of the pruned network model;

determining importance ranking of the residual point-wise convolution channels in the network model after fine tuning training; according to the importance ranking of the residual point-wise convolution channels and a preset channel trimming threshold, trimming the point-wise convolution channels in the network model after fine tuning training, performing fine tuning training on the weight of the trimmed model, and outputting the network model after fine tuning;

7. The deep neural network model pruning method according to claim 1, wherein the fine tuning training process for the weights of the pruned network model is as follows:

8. A deep neural network model pruning system, comprising:

9. A deep neural network model pruning device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the deep neural network model pruning method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the deep neural network model pruning method according to any one of claims 1 to 7.