CN115829022A - CNN network pruning rate automatic search method and system based on reinforcement learning - Google Patents

CNN network pruning rate automatic search method and system based on reinforcement learning Download PDF

Info

Publication number
CN115829022A
CN115829022A CN202211436627.2A CN202211436627A CN115829022A CN 115829022 A CN115829022 A CN 115829022A CN 202211436627 A CN202211436627 A CN 202211436627A CN 115829022 A CN115829022 A CN 115829022A
Authority
CN
China
Prior art keywords
network
layer
pruned
pruning
target network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211436627.2A
Other languages
Chinese (zh)
Inventor
汪杨鑫
刘剑毅
肖化超
霍嘉欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202211436627.2A priority Critical patent/CN115829022A/en
Publication of CN115829022A publication Critical patent/CN115829022A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a CNN network pruning rate automatic search method and system based on reinforcement learning, and belongs to the field of artificial intelligence. The invention gives the retention rate of the whole FLOPs of the network, learns the optimal pruning rate of each layer of the convolutional neural network by utilizing a deep reinforcement learning TD3 intelligent agent, takes each layer of characteristics of the network to be pruned as a state and the layer retention rate as an action, calculates the hierarchy reward based on a twin network framework, improves the calculation efficiency of a hierarchy reward function by utilizing a characteristic diagram comparison method, is combined with network-level reward, searches out the optimal pruning rate of each layer, performs channel pruning on the network to be pruned after pre-training, and finally obtains the high-precision compressed network through fine tuning. The method and the system can well find the potential sparse sensitivity of each layer in the high redundancy network and the compact network, thereby realizing better compromise between the network precision and the sparsity after pruning.

Description

CNN network pruning rate automatic search method and system based on reinforcement learning
Technical Field
The invention belongs to the field of artificial intelligence, and relates to a CNN network pruning rate automatic search method and system based on reinforcement learning.
Background
Deep learning is one of important leading-edge core technologies of artificial intelligence, and the application of the deep learning penetrates almost all fields of national production and life. The convolutional neural network is an important deep learning technology, however, the huge model parameters and the operation amount of the convolutional neural network are often limited by hardware computing resources, and model compression is required in actual application deployment. Pruning is one of the main means for realizing deep network model compression, and the goal is to utilize the inherent sparsity of the model to remove some unimportant parameters without obviously losing the precision of the model, thereby compressing the scale of the model. According to the granularity of the pruned object, the current pruning algorithm can be divided into two categories: unstructured pruning, which is oriented to remove individual weights or neurons, and structured pruning, which is oriented to remove entire channels or convolution kernels. The structured pruning can directly utilize general convolution and matrix multiplication operators, a specially-customized hardware base is not needed to realize reasoning acceleration or save memory in operation, and the actual application is more convenient.
The early structured pruning technology focuses on seeking a more accurate and efficient channel importance measurement method in multiple aspects, so that model compression is realized through importance ranking and preset pruning rate. However, the redundancy and sensitivity of the layers in CNNs are very different. Most pruning algorithms use empirical means to determine the pruning rate/channel retention rate of each layer in the network, and some algorithms determine the pruning rate of each layer according to the global pruning rate by normalizing the importance of each layer. These algorithms have difficulty in mining the actual redundancy of each layer of the network, thereby affecting the accuracy of the pruned network.
In recent years, some methods focus on automatically searching the pruning rate of each network layer, including network structure search, heuristic search, dichotomy, statistical modeling, reinforcement learning algorithm, and the like. The deep reinforcement learning algorithm can search the optimal strategy in the continuous action space through interaction between the intelligent agent and the environment, is very suitable for solving the problems, and the effect of a representative algorithm such as AMC and the like is obviously superior to that of the traditional method for determining the pruning rate of each layer according to experience. However, the current algorithms still show the global accuracy of the network as the main design basis of the reward function, the influence of the redundancy of each layer of the network on the accuracy is not fully considered, and a good compromise between the model compression ratio and the network accuracy cannot be realized.
Disclosure of Invention
The invention aims to solve the problem that the influence of the redundancy of each network layer on the precision is not fully considered in an algorithm for automatically searching the pruning rate of each network layer in the prior art, so that a better compromise between a model compression ratio and the network precision cannot be realized, and provides a CNN network pruning rate automatic searching method and system based on reinforcement learning.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
the invention provides a CNN network pruning rate automatic search method based on reinforcement learning, which comprises the following steps:
constructing a Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned;
searching an optimal strategy of a Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth strengthening algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;
and performing structured pruning on the pre-trained target network to be pruned according to the optimal pruning rate of each layer of the target network to be pruned, so as to realize model compression of the target network to be pruned.
Preferably, the method for constructing the markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned is as follows:
taking a target network M to be pruned which is pre-trained and carries out pruning as an environment, and constructing a state vector abstracted from each layer of characteristics of the target network M to be pruned as the observation of a reinforcement learning agent on the environment;
defining the channel retention rate of each layer of a target network M to be pruned as the action executed by the reinforcement learning agent to form a continuous action space;
constructing a reward function which gives consideration to the overall precision performance of the target network M to be pruned and the precision performance of each layer;
copying a target network M to be pruned as a reference network B; the target network M to be pruned and the reference network B form a twin network (M, B), and the reference network B is used for calculating the reward function.
Preferably, the reward function R (i, t) is calculated as follows:
R(i,t)=λ·r L (i,t)+(1-λ)·R N (i)-α·||r L (i)|| 2
wherein r is L (i, t) is that the action a is executed on the t-th layer of the target network M to be pruned in the i-th epadiode t Later earned tier rewards; r N (i) Performing a round of layer-by-layer pruning on the target network M to be pruned, and calculating the network-level reward according to the measured precision; λ is a super parameter for the weights of the control level reward and the network level reward respectively, | | r L (i)|| 2 Is a punishment item for restraining the problem of the post-disadvantage brought by the hierarchy reward; alpha is a hyper-parameter used to control the penalty term weight.
Preferably, the hierarchical reward function r L The calculation method of (i, t) is as follows:
Figure BDA0003947031970000031
wherein D (x) represents the temporary network M under the same input image x t The Euclidean distance between the t +1 th layer output characteristic diagram of the reference network B and the t +1 th layer output characteristic diagram of the reference network B;
the euclidean distance D (x) is calculated as follows:
Figure BDA0003947031970000032
wherein the content of the first and second substances,
Figure BDA0003947031970000033
representing the output characteristic diagram of the t +1 th layer in the reference network B after the input image x,
Figure BDA0003947031970000034
a parameter tensor representing the reference network B;
Figure BDA0003947031970000035
representing post-input image x temporary network M t The output characteristic diagram of the middle t +1 layer,
Figure BDA0003947031970000036
represents M t The parameter tensor of (2).
Preferably, the output feature map of the t +1 th layer in the reference network B after the input image x
Figure BDA0003947031970000037
The calculation method of (2) is as follows:
Figure BDA0003947031970000038
wherein the content of the first and second substances,
Figure BDA0003947031970000039
represents a sub-network in the target network that contains all the computing units between layers t and t + 1;
using M t Computing temporary network M after input image x by output characteristic diagram of t-1 layer t Output characteristic diagram of middle t +1 layer
Figure BDA0003947031970000041
The method comprises the following steps:
Figure BDA0003947031970000042
wherein the operator
Figure BDA0003947031970000043
Representing element-level multiplication on each lane of the feature map.
Preferably, the column vector r formed by the layer-level rewards in the ith epamode L (i) Is defined as follows:
r L (i)=[r L (i,1),r L (i,2),…,r L (i,T)]′
wherein L is utilized 2 The norm homogenizes the tier reward values for each tier.
Preferably, the depth-enhanced algorithm is a dual-delay depth deterministic policy gradient method TD3.
The invention provides a CNN network pruning rate automatic search system based on reinforcement learning, which comprises the following steps:
the model building module is used for building a Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned;
the optimal strategy searching module is used for searching the optimal strategy of the Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth strengthening algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;
and the CNN network pruning module is used for performing structured pruning on the pre-trained target network to be pruned according to the optimal pruning rate of each layer of the target network to be pruned, so as to realize model compression of the target network to be pruned.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the reinforcement learning-based CNN network pruning rate automatic search method when the computer program is executed.
A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of a reinforcement learning-based CNN network pruning rate automatic search method.
Compared with the prior art, the invention has the following beneficial effects:
aiming at the model compression task of a target network to be pruned, the CNN network pruning rate automatic search method based on reinforcement learning provided by the invention has the advantages that compared with the traditional method for artificially determining the pruning rate of each layer according to experience, the layer pruning rate automatic search technology based on deep reinforcement learning has higher automation degree of network pruning, can dig out the potential inherent sparsity characteristic of each layer of the network, and can better balance the effects of model precision and model compression ratio on the premise of determining the overall pruning rate. The deep reinforcement learning agent with more excellent performance is fully utilized, and the problems of over-estimation of the compression ratio and unstable parameter updating existing in the methods are solved.
Further, aiming at the problem of network precision reduction caused by coarse-grained 'multi-step same reward' in the prior similar technology, the designed fine-grained hierarchical reward function integrates network-level feedback and hierarchical feedback factors after each layer of pruning action to obtain real single-step independent reward, so that an intelligent body can timely obtain feedback from the environment in the whole pruning strategy searching process, the characteristics of reinforcement learning are better met, and the intelligent body is favorable for learning a better pruning strategy under the same conditions.
Furthermore, each layer of pruning action has an independent reward value, so that the intelligent agent can fully mine the hidden sparse sensitivity of each layer of the network, and a better pruning rate strategy is searched.
The invention provides a CNN network pruning rate automatic search system based on reinforcement learning, which is characterized in that the system is divided into a model construction module, a model solving module and a model pruning module, and the modules are mutually independent by adopting a modularization idea, so that the modules can be conveniently and uniformly managed.
Drawings
In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flow chart of the reinforced learning-based CNN network pruning rate automatic search method of the present invention.
Fig. 2 is a general frame diagram of the network pruning rate reinforcement learning strategy search method according to the present invention.
FIG. 3 is a schematic diagram of the operation of the method of the present invention on a time axis.
FIG. 4 is a schematic diagram of the calculation of a hierarchical reward function using a twin network according to the present invention.
FIG. 5 is a visual diagram of sparsity of each layer of the VGG-16 network obtained in a verification experiment.
FIG. 6 shows a comparison of the effectiveness of agents TD3 and DDPG according to the present invention in a validation experiment.
Fig. 7 is a diagram of an enhanced learning-based CNN network pruning rate automatic search system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the embodiments of the present invention, it should be noted that if the terms "upper", "lower", "horizontal", "inner", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which is usually arranged when the product of the present invention is used, the description is merely for convenience and simplicity, and the indication or suggestion that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, cannot be understood as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Furthermore, the term "horizontal", if present, does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood according to specific situations by those of ordinary skill in the art.
The invention is described in further detail below with reference to the accompanying drawings:
example one
The invention provides a CNN network pruning rate automatic search method based on reinforcement learning, which comprises the following steps as shown in figure 1:
s1, constructing a Markov decision process model according to the pruning rate search problem of each layer of a target network to be pruned;
the method for constructing the Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned is as follows:
taking a target network M to be pruned which is pre-trained and carries out pruning as an environment, and constructing a state vector abstracted from each layer of characteristics of the target network M to be pruned as the observation of a reinforcement learning agent on the environment;
defining the channel retention rate of each layer of the target network M to be pruned as the action executed by the reinforcement learning agent to form a continuous action space;
constructing a reward function which gives consideration to the overall precision performance of the target network M to be pruned and the precision performance of each layer;
copying a target network M to be pruned as a reference network B; the target network M to be pruned and the reference network B form a twin network (M, B), and the reference network B is used for calculating the reward function.
The reward function R (i, t) is calculated as follows:
R(i,t)=λ·r L (i,t)+(1-λ)·R N (i)-α·||r L (i)|| 2
wherein r is L (i, t) is that the action a is executed on the t-th layer of the target network M to be pruned in the i-th epadiode t Later earned tier rewards; r N (i) Performing a round of layer-by-layer pruning on the target network M to be pruned, and calculating the network-level reward according to the measured precision; lambda is a super-parameter of the weight occupied by the control level reward and the network level reward respectively, | | r L (i)|| 2 Is a punishment item for restraining the inferior problem caused by the level reward; alpha is a hyper-parameter for controlling the weight of the penalty term.
Hierarchical reward function r L The calculation method of (i, t) is as follows:
Figure BDA0003947031970000081
wherein D (x) represents the temporary network M under the same input image x t The Euclidean distance between the t +1 th layer output characteristic diagram of the reference network B and the t +1 th layer output characteristic diagram of the reference network B;
the Euclidean distance D (x) is calculated as follows:
Figure BDA0003947031970000082
wherein the content of the first and second substances,
Figure BDA0003947031970000083
to representThe output characteristic diagram of the t +1 th layer in the reference network B after the image x is input,
Figure BDA0003947031970000084
a parameter tensor representing the reference network B;
Figure BDA0003947031970000085
representing post-input image x temporary network M t The output characteristic diagram of the middle t +1 layer,
Figure BDA0003947031970000086
represents M t The parameter tensor of (2).
Output characteristic diagram of t +1 th layer in reference network B after inputting image x
Figure BDA0003947031970000087
The calculation method of (2) is as follows:
Figure BDA0003947031970000088
wherein the content of the first and second substances,
Figure BDA0003947031970000089
representing a sub-network in the target network containing all computing units between layers t and t + 1;
by means of M t Computing temporary network M after input image x by output characteristic diagram of t-1 layer t Output characteristic diagram of middle t +1 layer
Figure BDA00039470319700000810
The method comprises the following steps:
Figure BDA00039470319700000811
wherein the operator
Figure BDA00039470319700000812
Representing element-level multiplication on each lane of the feature map.
Column vector r composed of layer-level rewards in ith epamode L (i) Is defined as follows:
r L (i)=[r L (i,1),r L (i,2),…,r L (i,T)]′
wherein L is utilized 2 The norm homogenizes the tier reward values for each tier.
S2, searching an optimal strategy of the Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth enhancement algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;
the depth strengthening algorithm is a double-delay depth certainty strategy gradient method TD3.
And S3, according to the optimal pruning rate of each layer of the target network to be pruned, performing structured pruning on the pre-trained target network to be pruned, and realizing model compression of the target network to be pruned.
The specific process of constructing the twin network architecture (M, B) is as follows:
wherein M is a target network to be pruned, and a temporary network obtained by performing single-layer pruning on the t-th layer is recorded as M t M's sub-network comprising all the computing units between layers t and t +1 is denoted as
Figure BDA0003947031970000091
All pruning behaviors in the process of searching the pruning rate of each layer are pseudo-pruning, namely Mask t According to the action a t And a temporary mask matrix generated by the channel importance ranking result for evaluating the pruning action a t Impact on network accuracy. And after the reinforcement learning algorithm searches the optimal pruning rate of each layer, performing substantial pruning on the network.
And copying the original M network before pruning to obtain a reference network B, wherein the reference network is kept unchanged in the execution process of the pruning rate search algorithm and is used as a reference for measuring the degree of network precision influenced by the pruning behaviors of M layers.
As shown in FIG. 2, the convolutional neural network pruning rate self based on deep reinforcement learning proposed by the present inventionA dynamic search method and a system overall architecture diagram. The method comprises three main parts, namely a twin network architecture consisting of a reference network B and a target network M to be pruned, an agent consisting of a TD3 deep reinforcement learning algorithm, a reward function for reinforcement learning feedback and the like. Taking the T-layer target network M as the environment of reinforcement learning, and taking the channel retention rate of each layer as the action a of reinforcement learning t And automatically searching the optimal pruning rate of each layer of the network by adopting a TD3 algorithm.
A complete round of process from the layer 1 sequential search to the layer T of the target network M is defined as an epamode of the reinforcement learning agent training. In an epicode, firstly, for each layer t of the target network M, the agent observes the environment to obtain a state vector s t (ii) a Then, calculating the corresponding action a of the current layer by using an Actor network in the TD3 algorithm t (ii) a And calculating the hierarchy reward r of the current layer by comparing the reference network B with the target network M L (i, t) as indicated by the thick solid arrow in FIG. 2; then moving to the next layer to repeat the operation, and after the layer-by-layer search of the T layer, completing the pruning rate search and the pseudo-pruning operation of each layer of the whole network to obtain a target network M (i) after pseudo-pruning; calculating the image classification accuracy of the target network M (i) based on the pseudo-pruned target network and the verification image set, and using the image classification accuracy as a network-level reward R N (i) As indicated by the dashed arrows in fig. 2; awarding the hierarchy r L (i, t) and network level rewards R N (i) Obtaining total rewards R (i, t) of each layer of action through weighted combination, storing the total rewards R (i, t) together with the state vector and the action into an experience playback pool, updating the TD3 intelligent network through the sampling of the playback pool, and obtaining the pruning rate strategy pi updated by the epsilon i (s t )。
As shown in fig. 3, the operation of the present invention in the time dimension is shown. Each epicode receives a network-level reward R N (i) Wherein each layer of pruning rate action is executed to obtain an independent hierarchy reward r L (i, t) which combine to form a total reward R (i, t) associated with the corresponding action a t Corresponding state vector s t And storing the experience playback pool, and updating the intelligent agent network once after each epsilon is finished. Enough pass throughTraining the agent through multiple rounds of epicode circulation, and obtaining a final pruning strategy pi(s) when the strategy is not obviously changed any more t )。
Layer t state vector s of target network M observed by agent in FIG. 2 t The concrete form is as follows:
s t =(t,n,c,h,w,stride,k,FLOPs[t],reduced,rest,a t-1 )
wherein T ∈ {1, \8230;, T } is the layer index of the T-layer CNN network, n and c are the output and input channel numbers of the T-th layer, respectively, the height and width of the input characteristic image of the layer are recorded as h and w, stride is the moving step length of the convolution kernel of the layer, k is the side length of the convolution kernel, FLOPs [ T [ ]]Representing the floating point calculation amount of the t layer, reduced representing the floating point calculation amount reduced by all pseudo pruning operations before the t layer, rest representing the residual floating point calculation amount of the subsequent layer, a t-1 Indicating the agent action performed by layer t-1, i.e. the channel reservation rate of the previous layer.
Action a in FIG. 2 t Means the channel reservation rate, a, of the t-th layer of the target network M t E (0, 1), namely the value range of the action is a real number between 0 and 1. The Actor network of the TD3 agent follows the observed s t Calculate a t The action is calibrated before actual pruning is executed, and the action is ensured to be in accordance with the preset upper and lower limits of pruning rate of each layer and the preset integral FLOP of the network s A limiting condition; after obtaining the final pruning strategy pi(s) t ) Then, the channel importance metric criteria (e.g., L of the convolution kernel) will be relied upon 1 Norm) to remove the channel in the later sequence, and completing the network pruning.
In consideration of the continuity of the state space and the continuity of the motion space in the problem, the agent module in fig. 2 adopts a TD3 deep reinforcement learning algorithm.
The specific implementation details of the tier rewards represented by the heavy solid arrows in fig. 2 are:
the output characteristic diagram of the t +1 th layer of the reference network B can be represented by using the output characteristic diagram of the t-th layer thereof. Defining the t-th layer output characteristic diagram after the reference network B inputs the image x as
Figure BDA0003947031970000111
Wherein
Figure BDA0003947031970000112
A parameter tensor representing the reference network. Defining a sub-network in the target network containing all computing units between layers t and t +1 as
Figure BDA0003947031970000113
The target network that does not perform pruning has the same output as the output profile of the t +1 th layer of the reference network:
Figure BDA0003947031970000114
principle of calculating the hierarchy reward by using twin network (M, B) comparison As shown in FIG. 4, a pruning action a is performed on the t-th layer of the target network t Then, the number of output characteristic image channels changes, and the output characteristic image channels further pass through
Figure BDA0003947031970000115
T +1 layer output characteristic image obtained after operation
Figure BDA0003947031970000116
Feature map of output from reference network
Figure BDA0003947031970000117
Compared with the same channel number, the distance between the two groups of feature maps can be compared.
In the strategy search stage, a pseudo pruning mode is adopted, namely according to the action a t Generating Mask matrix Mask by sorting sum channel importance t And the method acts on a convolution kernel to measure the influence of the channel pruning of the current layer on the network precision. For the same input image x, directly using the output feature map of the t +1 th layer of the reference network as the input of the t th layer after the pruning of the target network when calculating the hierarchy reward, so that the direct calculation can be realized through (1)
Figure BDA0003947031970000121
Figure BDA0003947031970000122
Wherein the operator
Figure BDA0003947031970000123
Representing element-level multiplications acting on a feature graph. Inputting an image x, pruning the t-th layer of the target network M to obtain a t + 1-th layer output characteristic diagram, calculating the Euclidean distance between the t + 1-th layer output characteristic diagram and the t + 1-th layer output characteristic diagram of the reference network B, and expressing the Euclidean distance by a formula as follows:
Figure BDA0003947031970000124
thereby obtaining the single-step action a executed in the t-th layer of the network t Late tier prize r L (i, t) is:
Figure BDA0003947031970000125
in fig. 2, the hierarchy awards and the network-level awards are merged, and the proposed overall award function is:
R(i,t)=λ·r L (i,t)+(1-λ)·R N (i)-α·||r L (i)|| 2 (5)
wherein r is L (i)=[r L (i,1),r L (i,2),…,r L (i,T)]Is a vector formed by awards of each layer level in the ith epamode, | | · | | luminous flux 2 Represents L 2 Norm, which is used for relieving the problem of later disadvantage caused by hierarchy reward, namely the network is forced to select high pruning ratio at the later layer; λ is a hyper-parameter controlling two bonus proportions; alpha is a hyper-parameter that controls the weight of the penalty term. According to the cross validation test, choosing λ =0.2, α =0.2 will result in better performance.
The invention is in useAfter searching and learning of a plurality of epsicodes are executed, an optimal pruning strategy of each layer under the condition of a preset global pruning rate is obtained; according to the strategy and the importance degree of each layer of channels, sorting (based on L is adopted) 1 Importance measure of the criteria), perform substantive channel pruning on each layer of the target network and save the network structure thereof; and finishing the whole process of network pruning after fine adjustment.
Verification test
The effect comparison of the invention and a plurality of existing network pruning algorithms is carried out, including L 1 -norm, network-scaling, SFP, CACP, AMC, LPF, HRank, GAL, variational CNN planning, hinge, FPGM and AOFP. A natural scene image classification data set CIFAR-10 and a remote sensing image target classification data set UC-Merceded Land-Use are used in the verification experiment, and target networks to be pruned are a convolutional neural network VGG-16 and a lightweight network MobileNet-V1.
TABLE 1
Figure BDA0003947031970000131
Table 1 shows the comparison of the performance of the network obtained by the present invention and the traditional pruning algorithm for the VGG-16 network on the CIFAR-10 dataset. The coarsening values in the table are the best results at this retention. It can be seen that under four preset global compression rate indexes of {30%,50%,60%,70% }, the pruned network obtained by the invention has the best Top1 classification accuracy, and when the compression ratio is higher than 50%, the network precision obtained by pruning is higher than that of a reference network (original pruned network), which indicates that the invention can effectively eliminate the over-fitting problem in the high-redundancy network.
TABLE 2
Figure BDA0003947031970000141
Table 2 shows a comparison of the present invention with another network pruning algorithm AMC, which also has pruning rate auto-search capability. Pruning is carried out on the UC-Merced Land-Use data set aiming at the lightweight network MobileNet-V1, the size of all images is adjusted to be 224x224, and the bold numerical value in the table is the best result under the global retention rate. It can be seen that the network precision of the invention is superior to AMC algorithm under {10%,30%,70% } three FLOPs retention rates, and the application potential of the invention is demonstrated after deployment of compression on hardware resource-limited equipment.
As shown in FIGS. 5 to 6, the demonstrated validation experiments are the comparison results of the performance of the VGG-16 network pruning on the CIFAR-10 data set.
Fig. 5 shows the pruning rate of each layer automatically searched by the present invention for different global retention rates. It can be seen that the channel retention rates of the layers near the front end of the network are higher, the channel retention rates of the layers near the rear end are lower, and the overall retention rates show approximately the same trend. The experimental result reveals the inherent sparsity and redundancy difference of each layer of the CNN network, is consistent with the research result of each layer of sparsity sensitivity of VGG-16 in the previous work, and verifies the correctness of the invention.
FIG. 6 illustrates the effect of selecting different deep reinforcement learning agents on pruning rate search results. It can be seen that under all conditions of the preset global pruning rate, the TD3 intelligent agent can obtain higher network precision after pruning compared with the DDPG intelligent agent, and the advantage of the TD3 intelligent agent adopted in the invention is verified.
Example two
The system of the CNN network pruning rate automatic search method based on reinforcement learning, as shown in FIG. 7, includes:
the model building module is used for building a Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned;
the optimal strategy searching module is used for searching the optimal strategy of the Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth strengthening algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;
and the CNN network pruning module is used for performing structured pruning on the pre-trained target network to be pruned according to the optimal pruning rate of each layer of the target network to be pruned, so as to realize model compression of the target network to be pruned.
EXAMPLE III
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method for automatic search of network pruning strategies for obtaining a hierarchical reward using a twin network when executing the computer program, wherein the memory may comprise a memory, such as a high speed random access memory, and may further comprise a non-volatile memory, such as at least one disk memory or the like; the processor, the network interface and the memory are connected with each other through an internal bus, wherein the internal bus can be an industrial standard system structure bus, a peripheral component interconnection standard bus, an extended industrial standard structure bus and the like, and the bus can be divided into an address bus, a data bus, a control bus and the like. The memory is used for storing programs, and particularly, the programs can comprise program codes which comprise computer operation instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
Example four
A computer readable storage medium storing a computer program which when executed by a processor implements the steps of the method for automatic search of network pruning policies for obtaining tier rewards using twin networks, in particular, but not exclusively, volatile memory and/or non-volatile memory, for example. The volatile memory may include Random Access Memory (RAM) and/or cache memory (cache), among others. The non-volatile memory may include a Read Only Memory (ROM), a hard disk, a flash memory, an optical disk, a magnetic disk, and the like.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A CNN network pruning rate automatic search method based on reinforcement learning is characterized by comprising the following steps:
constructing a Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned;
searching an optimal strategy of a Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth enhancement algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;
and performing structured pruning on the pre-trained target network to be pruned according to the optimal pruning rate of each layer of the target network to be pruned, so as to realize model compression of the target network to be pruned.
2. The CNN network pruning rate automatic search method based on reinforcement learning of claim 1, wherein the method for constructing the Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned is as follows:
taking a target network M to be pruned which is pre-trained and carries out pruning as an environment, and constructing a state vector abstracted from each layer of characteristics of the target network M to be pruned as the observation of a reinforcement learning agent on the environment;
defining the channel retention rate of each layer of the target network M to be pruned as the action executed by the reinforcement learning agent to form a continuous action space;
constructing a reward function which gives consideration to the overall precision performance of the target network M to be pruned and the precision performance of each layer;
copying a target network M to be pruned as a reference network B; the target network M to be pruned and the reference network B form a twin network (M, B), and the reference network B is used for calculating the reward function.
3. The reinforced learning-based CNN network pruning rate automatic search method according to claim 1, wherein the reward function R (i, t) is calculated as follows:
R(i,t)=λ·r L (i,t)+(1-λ)·R N (i)-α·||r L (i)|| 2
wherein r is L (i, t) is that the action a is executed on the t-th layer of the target network M to be pruned in the i-th epadiode t Later earned tier rewards; r N (i) Performing a round of layer-by-layer pruning on the target network M to be pruned, and calculating the network-level reward according to the measured precision; lambda is a super-parameter of the weight occupied by the control level reward and the network level reward respectively, | | r L (i)|| 2 Is a punishment item for restraining the inferior problem caused by the level reward; alpha is a hyper-parameter used to control the penalty term weight.
4. The CNN network pruning rate automatic search method based on reinforcement learning of claim 3, wherein the hierarchical reward function r is L The calculation method of (i, t) is as follows:
Figure FDA0003947031960000021
wherein D (x) represents the temporary network M under the same input image x t The Euclidean distance between the t +1 th layer output characteristic diagram of the reference network B and the t +1 th layer output characteristic diagram of the reference network B;
the euclidean distance D (x) is calculated as follows:
Figure FDA0003947031960000022
wherein the content of the first and second substances,
Figure FDA0003947031960000027
represents the output characteristic diagram of the t +1 th layer in the reference network B after the input image x,
Figure FDA0003947031960000028
a parameter tensor representing the reference network B;
Figure FDA0003947031960000029
representing post-input image x temporary network M t The output characteristic diagram of the middle t +1 layer,
Figure FDA00039470319600000210
represents M t The parameter tensor of (2).
5. The CNN network pruning rate automatic search method based on reinforcement learning of claim 4, wherein the output feature map of the t +1 th layer in the reference network B after the input image x
Figure FDA00039470319600000211
The calculation method of (2) is as follows:
Figure FDA0003947031960000023
wherein the content of the first and second substances,
Figure FDA0003947031960000024
represents a sub-network in the target network that contains all the computing units between layers t and t + 1;
using M t Computing a post-input image x temporary network M by using an output feature map of a t-1 layer t Output characteristic diagram of middle t +1 layer
Figure FDA00039470319600000212
The method comprises the following steps:
Figure FDA0003947031960000025
wherein the operator
Figure FDA0003947031960000026
Representing element-level multiplication on each lane of the feature map.
6. The CNN network pruning rate automatic search method based on reinforcement learning of claim 3, wherein the column vector r formed by the layer level rewards in the ith epamode L (i) Is defined as follows:
r L (i)=[r L (i,1),r L (i,2),…,r L (i,T)]′
wherein L is utilized 2 The norm homogenizes the tier reward values for each tier.
7. The reinforced learning-based CNN network pruning rate automatic search method according to claim 1, characterized in that the depth reinforcement algorithm is a double-delay depth deterministic policy gradient method TD3.
8. The CNN network pruning rate automatic search system based on reinforcement learning according to any one of claims 1 to 7, characterized by comprising:
the model building module is used for building a Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned;
the optimal strategy searching module is used for searching the optimal strategy of the Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth strengthening algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;
and the CNN network pruning module is used for performing structured pruning on the pre-trained target network to be pruned according to the optimal pruning rate of each layer of the target network to be pruned, so as to realize model compression of the target network to be pruned.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the reinforcement learning-based CNN network pruning rate automatic search method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the steps of the reinforcement learning-based CNN network pruning rate automatic search method according to any one of claims 1 to 7.
CN202211436627.2A 2022-11-16 2022-11-16 CNN network pruning rate automatic search method and system based on reinforcement learning Pending CN115829022A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211436627.2A CN115829022A (en) 2022-11-16 2022-11-16 CNN network pruning rate automatic search method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211436627.2A CN115829022A (en) 2022-11-16 2022-11-16 CNN network pruning rate automatic search method and system based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN115829022A true CN115829022A (en) 2023-03-21

Family

ID=85528581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211436627.2A Pending CN115829022A (en) 2022-11-16 2022-11-16 CNN network pruning rate automatic search method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115829022A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129197A (en) * 2023-04-04 2023-05-16 中国科学院水生生物研究所 Fish classification method, system, equipment and medium based on reinforcement learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129197A (en) * 2023-04-04 2023-05-16 中国科学院水生生物研究所 Fish classification method, system, equipment and medium based on reinforcement learning

Similar Documents

Publication Publication Date Title
Shan et al. Double adaptive weights for stabilization of moth flame optimizer: Balance analysis, engineering cases, and medical diagnosis
CN111262858B (en) Network security situation prediction method based on SA _ SOA _ BP neural network
KR102071179B1 (en) Method of continual-learning of data sets and apparatus thereof
Wang et al. Network pruning using sparse learning and genetic algorithm
CN111063194A (en) Traffic flow prediction method
CN113762486B (en) Method and device for constructing fault diagnosis model of converter valve and computer equipment
CN109671102A (en) A kind of composite type method for tracking target based on depth characteristic fusion convolutional neural networks
CN110889450B (en) Super-parameter tuning and model construction method and device
Dutta et al. Integrating case-based and rule-based reasoning: the possibilistic connection
CN112085157B (en) Disease prediction method and device based on neural network and tree model
CN114548591B (en) Sequential data prediction method and system based on mixed deep learning model and Stacking
CN115829081B (en) Urban traffic carbon emission prediction method based on support vector regression model
CN115829022A (en) CNN network pruning rate automatic search method and system based on reinforcement learning
Georgieva Fuzzy rule-based systems for decision-making
CN115033878A (en) Rapid self-game reinforcement learning method and device, computer equipment and storage medium
CN114912741A (en) Effectiveness evaluation method and device for combat system structure and storage medium
Pachón et al. Senpis: Sequential network pruning by class-wise importance score
Al-Sayed Thinking systems in urban design: A prioritised structure model
CN113095501A (en) Deep reinforcement learning-based unbalanced classification decision tree generation method
Avi et al. Incremental online learning algorithms comparison for gesture and visual smart sensors
CN115170002A (en) Guarantee capability evaluation method based on neural network and application thereof
CN116805384A (en) Automatic searching method, automatic searching performance prediction model training method and device
Olsson et al. Automatic synthesis of neurons for recurrent neural nets
CN111724897A (en) Motion function data processing method and system
CN116339130B (en) Flight task data acquisition method, device and equipment based on fuzzy rule

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination