CN114742221A - Deep neural network model pruning method, system, equipment and medium - Google Patents

Deep neural network model pruning method, system, equipment and medium Download PDF

Info

Publication number
CN114742221A
CN114742221A CN202210314690.2A CN202210314690A CN114742221A CN 114742221 A CN114742221 A CN 114742221A CN 202210314690 A CN202210314690 A CN 202210314690A CN 114742221 A CN114742221 A CN 114742221A
Authority
CN
China
Prior art keywords
model
network model
pruning
convolution
pruned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210314690.2A
Other languages
Chinese (zh)
Inventor
马钟
樊一哲
毛远宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Microelectronics Technology Institute
Original Assignee
Xian Microelectronics Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Microelectronics Technology Institute filed Critical Xian Microelectronics Technology Institute
Priority to CN202210314690.2A priority Critical patent/CN114742221A/en
Publication of CN114742221A publication Critical patent/CN114742221A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system, equipment and a medium for pruning a deep neural network model, wherein the method comprises the following steps: carrying out sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; based on the importance evaluation result of the weight absolute value of each channel in point-wise convolution, pruning the convolution layer channel of the sparse model to obtain a pruned network model; carrying out fine tuning training on the weight of the pruned network model, and outputting the fine tuned network model to obtain the pruned deep neural network model; the invention realizes the sparseness of point-wise convolution weight, ensures the network precision and can effectively reduce the calculated amount and parameter amount of the model.

Description

Deep neural network model pruning method, system, equipment and medium
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a deep neural network model pruning method, system, equipment and medium.
Background
At present, a series of breakthrough progresses are made on an artificial intelligence technology taking a convolutional neural network as a core technology, the artificial intelligence technology is gradually applied to weaponry and various spacecrafts, and the applications of satellite-based on-orbit target detection, accurate hitting of intelligent target identification of missiles, autonomous obstacle avoidance, mission planning and the like are realized; however, the deep neural network model depending on the application realization usually has more model parameters and large calculation amount; in most practical application scenarios, the calculation unit of the neural network model in the AI application embedded device is often limited in size and power consumption, so that the model cannot be normally deployed due to excessive parameters, or the forward reasoning consumes a long time after deployment, and the real-time requirement of intelligent application cannot be met; therefore, how to cut the deep neural network by using the neural network pruning technology to reduce the parameter number of the network model and the calculation amount of the forward inference becomes a research hotspot of the machine learning at present.
The calculation amount of the neural network model can be greatly reduced by adopting the deep separable convolution to replace the traditional convolution, so that the method is widely applied to an embedded low-power-consumption platform; most of the existing model pruning methods aim at the traditional convolution, the existing pruning method for deep separable convolution at home and abroad has no mature scheme, and if the model pruning can be carried out on the deep separable convolution, the calculated amount can be further reduced on the basis of being lower than that of the traditional convolution, so that the method has important practical value.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides a deep neural network model pruning method, a system, equipment and a medium, which are used for solving the technical problems that most of the existing model pruning methods aim at the traditional convolution and no deep network model pruning method aiming at the depth separable convolution exists.
In order to achieve the purpose, the invention adopts the technical scheme that:
the invention provides a deep neural network model pruning method, which comprises the following steps:
carrying out sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution;
based on the importance evaluation result of the weight absolute value of each channel in point-wise convolution, pruning the convolution layer channel of the sparse model to obtain a pruned network model;
and carrying out fine tuning training on the weight of the trimmed network model, and outputting the fine tuned network model to obtain the trimmed deep neural network model.
Further, a process of performing sparse training on the model to be pruned to obtain a sparse model includes the following steps:
introducing L into a loss function of a model to be pruned1A regular term to obtain a new loss function;
and creating a training data set, and performing optimization training on the model to be pruned by using the training data set until a new loss function is converged to obtain the sparse model.
Further, the new loss function is:
J(θ;X,y)=Lemp(θ;X,y)+λΩ(θ)
Figure BDA0003568517540000021
wherein J (θ; X, y) is a new loss function; l isemp(theta; X, y) is the original loss function of the model to be pruned; lambda is an optimal penalty factor; omega (theta) is L1A regularization term; theta is a model parameter needing to be learned; x is a training data set; y is a label; i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; omegaiIs the overall parameters of the ith convolution layer in the model to be pruned.
Further, based on the importance evaluation result of the weight absolute value of each channel in point-wise convolution, pruning the convolution layer channel of the sparse model to obtain a pruned network model, specifically as follows:
determining the weight absolute value of a point-wise convolution channel in the sparse model, and sequencing the point-wise convolution channel in the sparse model according to the sequence of the weight absolute values from large to small to obtain the importance sequencing of the point-wise convolution channel;
and pruning the point-wise convolution channel in the sparse model according to the importance sequence of the point-wise convolution channel and a preset channel pruning threshold value to obtain the pruned network model.
Further, the preset channel pruning threshold is 5% of the number of point-wise convolution channels in the sparsification module before pruning.
Further, after the trimming training is carried out on the weight of the trimmed network model, the method also comprises a cyclic trimming-trimming step;
the cyclic trimming-fine tuning step specifically comprises the following steps:
determining importance sequencing of the remaining point-wise convolution channels in the network model after fine tuning training; according to the importance ranking of the residual point-wise convolution channels and a preset channel trimming threshold, trimming the point-wise convolution channels in the network model after fine tuning training, performing fine tuning training on the weight of the trimmed model, and outputting the network model after fine tuning;
the cycle end conditions of the cycle trimming-trimming step are as follows: the pruning proportion of the network model after fine tuning reaches a preset pruning proportion; the preset pruning proportion is determined according to the hyper-parameter alpha.
Further, the process of performing fine tuning training on the weight of the pruned network model specifically includes:
training the trimmed network model and model weight by adopting an initial learning rate; wherein the initial learning rate is 0.1.
The invention also provides a deep neural network model pruning system, which comprises the following components:
the sparse training module is used for performing sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution;
the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model;
and the fine tuning module is used for performing fine tuning training on the weight of the trimmed network model and outputting the fine tuned network model to obtain the trimmed deep neural network model.
The invention also provides a deep neural network model pruning device, which comprises:
a memory for storing a computer program;
and the processor is used for realizing the steps of the deep neural network model pruning method when the computer program is executed.
The present invention further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and wherein the computer program, when executed by a processor, implements the steps of the deep neural network model pruning method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a deep neural network model pruning method, wherein a deep separable convolution consists of depth-wise convolution and point-wise convolution; and aiming at the characteristic of deep separable convolution, carrying out channel-based pruning on point-wise convolution with large calculation amount. By evaluating the importance of the channels, unimportant channels are deleted, and thus the associated filters and feature maps are deleted. In order to ensure that the important characteristic diagram is reserved and the unimportant characteristic diagram is deleted, the sparsification of point-wise convolution weight is realized in the pruning process, the network precision is ensured, and meanwhile, the calculated amount and the parameter amount of the model can be effectively reduced.
Drawings
FIG. 1 is a flow chart of a deep neural network model pruning method according to the present invention;
FIG. 2 is a schematic diagram illustrating a process of pruning an input feature map according to the present invention;
fig. 3 is a flowchart of a pruning method for the MobileNetv2 model in the embodiment.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects of the present invention more apparent, the following embodiments further describe the present invention in detail. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, the invention provides a deep neural network model pruning method, which comprises the following steps:
step 1, performing sparse training on a model to be pruned to obtain a sparse model. Wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution includes a depth-wise convolution and a point-wise convolution.
The sparse training process specifically comprises the following steps:
step 11, introducing L into a loss function of the model to be pruned1A new loss function is obtained by the regular term; wherein the new loss function is:
J(θ;X,y)=Lemp(θ;X,y)+λΩ(θ)
Figure BDA0003568517540000051
Figure BDA0003568517540000052
wherein J (θ; X, y) is a new loss function; l isemp(theta; X, y) is the original loss function of the model to be pruned; lambda is an optimal penalty factor; omega (theta) is L1A regularization term; theta is a model parameter needing to be learned; x is a training data set; y is a label; i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; omegaiThe parameters of the ith convolution layer in the model to be pruned are all parameters; n is the number of training samples, and M is the number of training sample categories; p is a radical ofjmA predicted probability for an observation sample j belonging to category m; y isjmIs a symbolic function, yjmIs 0 or 1; wherein y is the true class of observation sample j is equal to mjmGet 1, otherwise, yjmTake 0.
And step 12, creating a training data set, and performing optimization training on the model to be pruned by using the training data set until a new loss function is converged to obtain the sparse model.
2, pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; wherein, the pruning process is specifically as follows:
step 21, determining a weight absolute value of a point-wise convolution channel in the sparse model, and sequencing the point-wise convolution channel in the sparse model according to a descending order of the weight absolute value to obtain an importance sequencing of the point-wise convolution channel;
step 22, pruning the point-wise convolution channel in the sparse model according to the importance ranking of the point-wise convolution channel and a preset channel pruning threshold value to obtain a pruned network model; in the invention, in the single pruning process, the preset channel pruning threshold is 5% of the number of point-wise convolution channels in the sparsification module before pruning.
Step 3, carrying out fine tuning training on the weight of the trimmed network model to obtain a fine tuning trained network model; the method specifically comprises the following steps: and training the trimmed network model and the model weight by adopting the initial learning rate to obtain the network model after fine tuning training.
Step 4, repeating the operations of the step 2 and the step 3 on the network model after the fine tuning training until the pruning proportion of the network model after the fine tuning reaches a preset pruning proportion; the preset pruning proportion is determined according to the hyper-parameter alpha.
The specific process is as follows:
determining importance sequencing of the remaining point-wise convolution channels in the network model after fine tuning training; according to the importance ranking of the residual point-wise convolution channels and a preset channel trimming threshold, trimming the point-wise convolution channels in the network model after fine tuning training, performing fine tuning training on the weight of the trimmed model, and outputting the network model after fine tuning; the cycle end conditions of the cycle trimming-trimming step are as follows: the pruning proportion of the network model after fine tuning reaches a preset pruning proportion; the preset pruning proportion is determined according to the hyper-parameter alpha.
And 5, evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification, and determining the validity of the pruning result of the network model after fine tuning.
The pruning principle is as follows:
according to the deep neural network model pruning method, in the pruning process, according to the characteristics of the deep separable convolution, the principle that the point-wise convolution layer channel with the larger sum of the absolute values of the convolution weights is combined, and the subsequent activation value generated by linear combination is stronger, so that the importance of the channel is stronger is adopted; high-dimensional information hidden in a depth separable convolution model channel is fully utilized, and pruning operation is performed more pertinently; in the pruning operation process, the accuracy of the network is recovered through multi-round training; the pruning method has no strong binding relation with the specific model structure of the model to be pruned, and can carry out pruning compression processing aiming at any convolutional neural network with application depth separable convolutional blocks.
The depth separable convolution comprises a depth-wise convolution and a point-wise convolution, wherein the point-wise convolution occupies most of the calculation; point-wise convolution is considered as a linear combination of different channels in the input feature map; in linear combination, the importance degree of an input channel can be estimated by evaluating the weight in the point-wise layer, so that smaller weights and associated characteristic diagrams are deleted, and the parameter quantity and the operand of the network are reduced.
In the deep separable convolution, most of the computation is still concentrated on the 1 × 1 convolution. Therefore, the emphasis of pruning is first placed on the 1 × 1 convolution; as shown in the following equation 1, the input M characteristic diagrams (F) are described in equation (1)1,F2,…,Fm) And a 1X 1 XM filter (k)1,k2,…,km) The convolution process of (2);
Figure BDA0003568517540000071
wherein, FiThe ith value in the input feature map of the point-wise convolution operation is 1 multiplied by m in the dimension of the set of feature maps; k is a radical ofiThe ith weight coefficient in the convolution kernel of point-wise convolution operation, the weight dimension of the set of convolution kernels is also 1 × 1 × m;
Figure BDA0003568517540000072
is a final output characteristic diagram; since the planar scale of the point-by-point filter is 1 × 1, the output of the point-by-point convolution can be considered as a polynomial linear combination of the input feature map.
Since the point-by-point convolution computation can be expressed as a linear combination of feature maps, the contribution of feature maps with small weights in the linear combination should be minimal for the result; if the weight corresponding to one of the 1 × 1 × M convolutions is small, it can be considered that the importance of the channel is not strong. In the pruning process, the unimportant feature maps are considered to be deleted together with the weights thereof.
As shown in fig. 2, the weight of adding a light-colored input feature map is small; in the pruning process, the light color characteristic diagram channels and the weights of the corresponding convolution kernels are deleted, so that the calculated amount and the corresponding parameter number are reduced; in the point-by-point convolution cutting process, pruning is respectively carried out according to point-by-point convolution of each layer; with L1The regularization is used as an evaluation standard of channel importance, and the channel importance of the 1 multiplied by 1 filter model is sequenced; in a single pruning process, sequentially removing the number of channels accounting for about 5 percent of the original number of channels of each depth classifiable convolutional layer according to an ascending order of importance, and then carrying out fine tuning training on the network weight; the hyper-parameter alpha is used for defining the total pruning proportion, and the process of pruning-fine adjustment is repeated for a plurality of times until the set value of the hyper-parameter alpha is reached; the super-parameter alpha obtains an empirical value through experiments, and calculated amount and parameters are reduced as far as possible on the premise of ensuring precision.
If the weights in the 1 × 1 convolution are all equally important during pruning, there is a relatively large impact on network performance after pruning. We want to keep the weight of unimportant channels as small as possible, close to zero, in linear combination, and the truly valuable channels and their weights. To achieve this, regularization methods are usually used, i.e. a regularization function is added to the objective function as a penalty function to limit the complexity of the model, and the generalization capability is improved by preventing overfitting.
The invention uses L in the training process1The regularization method sparsizes the weight, selects part of the weight to make the value of the weight prominent, and makes the values of other weights close to zero; wherein L is added1The overall loss function (also called objective function) after regularizing the penalty function is:
J(θ;X,y)=Lemp(θ;X,y)+λΩ(θ)
Figure BDA0003568517540000081
wherein J (θ; X, y) is a new loss function; l isemp(theta; X, y) is the original loss function of the model to be pruned; lambda is an optimal penalty factor; omega (theta) is L1A regularization term; theta is a model parameter needing to be learned; x is a training data set; y is a label; i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; n is the number of training samples, and M is the number of training sample categories; p is a radical ofjmA predicted probability of belonging to class m for observation sample j; y isjmIs a symbolic function, yjmIs 0 or 1; wherein y is the true class of observation sample j is equal to mjmGet 1, otherwise, yjmTake 0.
λ is a hyper-parameter used to adjust the parameter norm penalty term, also called the optimal penalty factor; when λ is 0, it indicates no regularization, the larger λ corresponds to the larger regularization penalty; when model training adopts L1When regularization is used for sparse, the empirical value of the λ parameter is generally 10-5Left and right; for theL of the parameter1The regularization is as follows:
Figure BDA0003568517540000082
wherein i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; omegaiIs the global parameter of the ith convolution layer in the model to be pruned.
The deep neural network model pruning method comprises the following steps: step A: acquiring a target image and a corresponding model original structure according to a specific task, and making a training and testing sample of the image into an input format of a model to be trained; and B: model pre-training module carries out tape L on model1Performing sparse training on the regularization item to obtain an original sparse model, namely a baseline model, and measuring baseline accuracy on a data set test sample; and C: adopting a model pruning module to delete 5% of the number of the base line channels of each point-wise layer by adopting an importance evaluation method based on the point-wise weight absolute value; step D: and D, performing fine tuning training on the pruned model output in the step C and the attached weight thereof by using a fine tuning module with a smaller initial learning rate. After the fine tuning training is finished, jumping back to the step C, wherein the loop termination condition of the step C-D, namely pruning-fine tuning, is influenced by a hyper-parameter alpha; step E: after the steps are completed, the model pre-training module is used for verifying the final model in the aspects of precision, calculated quantity/parameter quantity and the like so as to determine the effectiveness of the pruning work.
The invention also provides a deep neural network model pruning system which comprises a sparsification training module, a pruning module, a fine adjustment module, a cyclic pruning-fine adjustment module and an evaluation verification module; the sparse training module is used for performing sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; the fine tuning module is used for carrying out fine tuning training on the weight of the trimmed network model to obtain a fine tuning trained network model; the method specifically comprises the following steps: training the trimmed network model and the model weight by adopting an initial learning rate to obtain a network model after fine tuning training; the circulating pruning-fine tuning module is used for repeating the pruning operation in the pruning module and the fine tuning operation in the fine tuning module on the network model after the fine tuning training until the pruning proportion of the network model after the fine tuning reaches the preset pruning proportion; and the evaluation and verification module is used for evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification and determining the validity of the pruning result of the network model after fine tuning.
In the invention, a sparse training module: the method comprises the following steps of obtaining an original sparse network model by adopting a software and hardware platform comprising a general training frame and adopting sparse training, and having the functions of calculating quantity/parameter quantity statistics and testing set precision statistics; a model pruning module: carrying out model pruning according to a designed importance evaluation method based on a point-wise weight absolute value; a fine adjustment module: similar to the model pre-training module, adjustments are made in learning rate and network training structure. And the training device is used for training the obtained simplified network structure for a certain turn after the model pruning.
The invention also provides a deep neural network model pruning device, which comprises: a memory for storing a computer program; and the processor is used for realizing the steps of the deep neural network model pruning method when the computer program is executed.
When the processor executes the computer program, the steps of the deep neural network model pruning method are implemented, for example: carrying out sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; the fine tuning module is used for carrying out fine tuning training on the weight of the trimmed network model to obtain a fine tuning trained network model; the method specifically comprises the following steps: training the trimmed network model and the model weight by adopting an initial learning rate to obtain a network model after fine tuning training; the circulating pruning-fine tuning module is used for repeating pruning operation and fine tuning operation on the network model after fine tuning training until the pruning proportion of the network model after fine tuning reaches a preset pruning proportion; and the evaluation and verification module is used for evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification and determining the validity of the pruning result of the network model after fine tuning.
Alternatively, the processor implements the functions of the modules in the system when executing the computer program, for example: the sparse training module is used for performing sparse training on the pruning model to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; the fine tuning module is used for carrying out fine tuning training on the weight of the trimmed network model to obtain a fine tuning trained network model; the method specifically comprises the following steps: training the trimmed network model and the model weight by adopting an initial learning rate to obtain a network model after fine tuning training; the circulating pruning-fine tuning module is used for repeating the pruning operation in the pruning module and the fine tuning operation in the fine tuning module on the network model after the fine tuning training until the pruning proportion of the network model after the fine tuning reaches the preset pruning proportion; and the evaluation and verification module is used for evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification and determining the validity of the pruning result of the network model after fine tuning.
Illustratively, the computer program may be partitioned into one or more modules/units, stored in the memory and executed by the processor, to implement the invention. The one or more modules/units may be a series of instruction segments of a computer program capable of performing preset functions, and the instruction segments are used for describing the execution process of the computer program in the deep neural network model pruning device. For example, the computer program may be partitioned into a sparsification training module, a pruning module, a trimming module, a cyclic pruning-trimming module, and an evaluation validation module; the specific functions of each module are as follows: the training device comprises a thinning training module, a training module and a training module, wherein the thinning training module is used for performing thinning training on a model to be pruned to obtain a thinning model; the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution; the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model; the fine tuning module is used for carrying out fine tuning training on the weight of the trimmed network model to obtain a fine-tuned trained network model; the method specifically comprises the following steps: training the trimmed network model and the model weight by adopting an initial learning rate to obtain a network model after fine tuning training; the circulating pruning-fine tuning module is used for repeating the pruning operation in the pruning module and the fine tuning operation in the fine tuning module on the network model after the fine tuning training until the pruning proportion of the network model after the fine tuning reaches the preset pruning proportion; and the evaluation and verification module is used for evaluating the precision, the calculated amount and the parameter amount of the network model after fine tuning for evaluation and verification and determining the validity of the pruning result of the network model after fine tuning.
The deep neural network model pruning device can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The deep neural network model pruning device can comprise a processor and a memory, but is not limited to the processor and the memory. It will be understood by those skilled in the art that the above is only an example of the deep neural network model pruning device, and does not constitute a limitation to the deep neural network model pruning device, and may include more components than the above, or combine some components, or different components, for example, the deep neural network model pruning device may further include an input-output device, a network access device, a bus, etc.
The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor is a control center of the deep neural network model pruning device, and various interfaces and lines are used for connecting various parts of the whole deep neural network model pruning device.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the deep neural network model pruning device by running or executing the computer programs and/or modules stored in the memory and calling data stored in the memory.
The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash memory card (FlashCard), at least one disk storage device, a flash memory device, or other volatile solid state storage device.
The invention also provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of a deep neural network model pruning method.
The modules/units integrated by the deep neural network model pruning system can be stored in a computer readable storage medium if the modules/units are implemented in the form of software functional units and sold or used as independent products.
Based on such understanding, all or part of the processes in the deep neural network model pruning method can be realized by the present invention, and can also be completed by instructing relevant hardware through a computer program, where the computer program can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the deep neural network model pruning method can be realized. Wherein the computer program comprises computer program code, which may be in source code form, object code form, executable file or preset intermediate form, etc.
The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc.
It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
Examples
In this embodiment, the deep neural network model pruning method is specifically described by taking a pruning process of MobileNetv2 as an example.
As shown in fig. 3, the present embodiment provides a deep neural network model pruning method, including the following steps:
step 1, sparse training is carried out on the model MobileNetv2
According to a training framework packaged by a PyTorch deep neural network library, a target detection model MobileNetv2 is subjected to sparse training on an Nvidia RTX 2070GPU of an 8G video memory in an end-to-end mode.
The optimal penalty factor lambda of the regularization loss function of sparse training is 1 e-5; and the weight attenuation is set to be 5e-4 and the momentum is 0.9 by using a random gradient method (SGD) as an optimizer in the back propagation process.
In the training starting stage, random weight is adopted to carry out weight initialization on the baseline model; the input images are unified into square images with the side length of 32 pixels, and the number of batch processing images is set to be 128; wherein, 200 epochs are trained in total in the pre-training stage, the learning rate of the first 60 epochs is 0.1, the learning rate of the 61 st to 120 th epochs is 0.02, the learning rate of the 121 st to 160 th epochs is 4e-3, and the learning rate of the 161 st to 200 th epochs is 8 e-4.
Step 2, aiming at channel pruning of point-wise convolution
In this embodiment, the hyper-parameter α of the pruning ratio is set to 0.6; sorting weights corresponding to all channels in the plurality of 1 × 1 × M convolutional layers according to absolute values through a point-wise convolution calculation mode presented by a formula (1); in the pruning process, deleting the characteristic graph corresponding to the channel with the minimum absolute value and the weight thereof; channel deletion is performed several times until the layer is stopped when channels corresponding to around 5% of the number of channels in the baseline model are deleted.
Figure BDA0003568517540000141
Wherein, FiThe ith value in the input feature map of the point-wise convolution operation is 1 multiplied by m in the dimension of the set of feature maps; k is a radical ofiThe ith weight coefficient in the convolution kernel of point-wise convolution operation, the weight dimension of the set of convolution kernels is also 1 × 1 × m;
Figure BDA0003568517540000142
the characteristic diagram is finally output.
The channel selection rule to be deleted is as follows:
min{||ki||1},1≤i≤m。
step 3, fine tuning process
After the channel pruning aiming at the point-wise convolution is finished in the step 2, modifying the upper and lower connected network layers of each point-wise convolution layer; specifically, the upper and lower connection network layers include, but are not limited to, a batch normalization layer, an activation layer and a depth-wise convolution layer which are immediately adjacent to each other; so that the number of the channels of the feature graph and the weight in the transmission process is consistent, and the feature graph can be correctly reasoned; after all the neural network structures are trimmed; fine tuning was performed according to the same training frame and hyper-parameter settings described above for a total of 200 epochs.
And (3) circularly executing the processes in the steps 2 and 3 until the set pruning proportion with the hyper-parameter alpha of 0.6 is obtained, and circulating for 12 times in total.
Step 4, precision test and scale statistics of pruning model
The algorithm obtained by the method is verified on a CIFAR-100 data set, and the index use parameter compression rate, the calculated amount compression rate and the average precision are evaluated.
And (4) verification result:
for the CIFAR-100 dataset, the MobileNet V2 network is simplified by using the embodiment; the parameter quantity is reduced from 2.36938M of the original network to 0.46082M, and the compression rate is 19.45%; the calculated amount is reduced from 0.06775925G FLOPS of the original network to 0.013291428G FLOPS, and the compression rate is 19.62 percent; the network TOP-1 precision is changed from 68.42% to 67.41%, and the TOP-5 precision is changed from 90.97% to 90.54%; in other words, under the condition that the precision level of the network model is kept, the parameters and the calculated amount of the deep neural network model pruning algorithm based on the channel importance are reduced to be less than one fifth of the former parameters and calculated amount; the performance of this example based on the MobileNetv2 baseline model structure and the CIFAR-100 dataset.
The invention relates to a deep neural network model pruning method, a system, equipment and a medium, wherein a deep separable convolution consists of depth-wise convolution and point-wise convolution; and aiming at the characteristic of deep separable convolution, carrying out channel-based pruning on point-wise convolution with large calculation amount. By evaluating the importance of the channels, unimportant channels are deleted, and thus the associated filters and feature maps are deleted. In order to ensure that the important characteristic diagram is reserved and the unimportant characteristic diagram is deleted, the sparsification of point-wise convolution weight is realized in the pruning process, the network precision is ensured, and meanwhile, the calculated amount and the parameter amount of the model can be effectively reduced.
The above-described embodiment is only one of the embodiments that can implement the technical solution of the present invention, and the scope of the present invention is not limited by the embodiment, but includes any variations, substitutions and other embodiments that can be easily conceived by those skilled in the art within the technical scope of the present invention disclosed.

Claims (10)

1. A deep neural network model pruning method is characterized by comprising the following steps:
carrying out sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution;
based on the importance evaluation result of the weight absolute value of each channel in point-wise convolution, pruning the convolution layer channel of the sparse model to obtain a pruned network model;
and carrying out fine tuning training on the weight of the trimmed network model, and outputting the fine tuned network model to obtain the trimmed deep neural network model.
2. The deep neural network model pruning method according to claim 1, wherein the process of performing sparse training on the model to be pruned to obtain a sparse model is as follows:
introducing L into a loss function of a model to be pruned1A regular term to obtain a new loss function;
and creating a training data set, and performing optimization training on the model to be pruned by using the training data set until a new loss function is converged to obtain the sparse model.
3. The deep neural network model pruning method of claim 2, wherein the new loss function is:
J(θ;X,y)=Lemp(θ;X,y)+λΩ(θ)
Figure FDA0003568517530000011
wherein J (θ; X, y) is a new loss function; l isemp(theta; X, y) is the original loss function of the model to be pruned; lambda is an optimal penalty factor; omega (theta) is L1A regularization term; theta is a model parameter needing to be learned; x is a training data set; y is a label; i is the sequence number of the convolutional layer in the model to be pruned; c is the total number of the convolutional layers in the model to be pruned; omegaiIs the overall parameters of the ith convolution layer in the model to be pruned.
4. The deep neural network model pruning method according to claim 1, characterized in that based on an importance evaluation result of a weight absolute value of each channel in point-wise convolution, a process of pruning a convolution layer channel of a sparse model to obtain a pruned network model is specifically as follows:
determining the weight absolute value of a point-wise convolution channel in the sparse model, and sequencing the point-wise convolution channel in the sparse model according to the sequence of the weight absolute values from large to small to obtain the importance sequencing of the point-wise convolution channel;
and pruning the point-wise convolution channel in the sparse model according to the importance sequence of the point-wise convolution channel and a preset channel pruning threshold value to obtain the pruned network model.
5. The deep neural network model pruning method according to claim 4, wherein the preset channel pruning threshold is 5% of the number of point-wise convolution channels in the pre-pruning sparsization module.
6. The deep neural network model pruning method according to claim 1, characterized by further comprising a cyclic pruning-fine tuning step after fine tuning training of the weights of the pruned network model;
the cyclic trimming-fine tuning step specifically comprises the following steps:
determining importance ranking of the residual point-wise convolution channels in the network model after fine tuning training; according to the importance ranking of the residual point-wise convolution channels and a preset channel trimming threshold, trimming the point-wise convolution channels in the network model after fine tuning training, performing fine tuning training on the weight of the trimmed model, and outputting the network model after fine tuning;
the cycle end conditions of the cycle trimming-trimming step are as follows: the pruning proportion of the network model after fine tuning reaches a preset pruning proportion; the preset pruning proportion is determined according to the hyper-parameter alpha.
7. The deep neural network model pruning method according to claim 1, wherein the fine tuning training process for the weights of the pruned network model is as follows:
training the trimmed network model and model weight by adopting an initial learning rate; wherein the initial learning rate is 0.1.
8. A deep neural network model pruning system, comprising:
the sparse training module is used for performing sparse training on the model to be pruned to obtain a sparse model; wherein, the model to be pruned is a deep neural network model with depth separable convolution; the depth separable convolution comprises depth-wise convolution and point-wise convolution;
the pruning module is used for pruning the convolution layer channels of the sparse model based on the importance evaluation result of the weight absolute value of each channel in the point-wise convolution to obtain a pruned network model;
and the fine tuning module is used for performing fine tuning training on the weight of the trimmed network model and outputting the fine tuned network model to obtain the trimmed deep neural network model.
9. A deep neural network model pruning device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the deep neural network model pruning method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the deep neural network model pruning method according to any one of claims 1 to 7.
CN202210314690.2A 2022-03-28 2022-03-28 Deep neural network model pruning method, system, equipment and medium Pending CN114742221A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210314690.2A CN114742221A (en) 2022-03-28 2022-03-28 Deep neural network model pruning method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210314690.2A CN114742221A (en) 2022-03-28 2022-03-28 Deep neural network model pruning method, system, equipment and medium

Publications (1)

Publication Number Publication Date
CN114742221A true CN114742221A (en) 2022-07-12

Family

ID=82277739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210314690.2A Pending CN114742221A (en) 2022-03-28 2022-03-28 Deep neural network model pruning method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN114742221A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468101A (en) * 2023-03-21 2023-07-21 美的集团(上海)有限公司 Model pruning method, device, electronic equipment and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468101A (en) * 2023-03-21 2023-07-21 美的集团(上海)有限公司 Model pruning method, device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US20180260531A1 (en) Training random decision trees for sensor data processing
CN111950656B (en) Image recognition model generation method and device, computer equipment and storage medium
CN111079780B (en) Training method for space diagram convolution network, electronic equipment and storage medium
CN111079899A (en) Neural network model compression method, system, device and medium
CN114037844A (en) Global rank perception neural network model compression method based on filter characteristic diagram
CN112232476A (en) Method and device for updating test sample set
CN111723915B (en) Target detection method based on deep convolutional neural network
US20200364538A1 (en) Method of performing, by electronic device, convolution operation at certain layer in neural network, and electronic device therefor
CN112052951A (en) Pruning neural network method, system, equipment and readable storage medium
CN113011588A (en) Pruning method, device, equipment and medium for convolutional neural network
CN112101547B (en) Pruning method and device for network model, electronic equipment and storage medium
CN111027347A (en) Video identification method and device and computer equipment
CN114881225A (en) Power transmission and transformation inspection model network structure searching method, system and storage medium
CN114429208A (en) Model compression method, device, equipment and medium based on residual structure pruning
CN114742221A (en) Deep neural network model pruning method, system, equipment and medium
CN112132062B (en) Remote sensing image classification method based on pruning compression neural network
CN113344182A (en) Network model compression method based on deep learning
CN111275166B (en) Convolutional neural network-based image processing device, equipment and readable storage medium
CN113128664A (en) Neural network compression method, device, electronic equipment and storage medium
CN112766397A (en) Classification network and implementation method and device thereof
CN117079019A (en) Light-weight transducer hyperspectral image classification method and device
CN115081580A (en) Method for pruning pre-trained neural network model
CN116992945B (en) Image processing method and device based on greedy strategy reverse channel pruning
CN116992944B (en) Image processing method and device based on leavable importance judging standard pruning
CN114692816B (en) Processing method and equipment of neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination