CN115829022A

CN115829022A - CNN network pruning rate automatic search method and system based on reinforcement learning

Info

Publication number: CN115829022A
Application number: CN202211436627.2A
Authority: CN
Inventors: 汪杨鑫; 刘剑毅; 肖化超; 霍嘉欣
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-03-21

Abstract

The invention discloses a CNN network pruning rate automatic search method and system based on reinforcement learning, and belongs to the field of artificial intelligence. The invention gives the retention rate of the whole FLOPs of the network, learns the optimal pruning rate of each layer of the convolutional neural network by utilizing a deep reinforcement learning TD3 intelligent agent, takes each layer of characteristics of the network to be pruned as a state and the layer retention rate as an action, calculates the hierarchy reward based on a twin network framework, improves the calculation efficiency of a hierarchy reward function by utilizing a characteristic diagram comparison method, is combined with network-level reward, searches out the optimal pruning rate of each layer, performs channel pruning on the network to be pruned after pre-training, and finally obtains the high-precision compressed network through fine tuning. The method and the system can well find the potential sparse sensitivity of each layer in the high redundancy network and the compact network, thereby realizing better compromise between the network precision and the sparsity after pruning.

Description

CNN network pruning rate automatic search method and system based on reinforcement learning

Technical Field

The invention belongs to the field of artificial intelligence, and relates to a CNN network pruning rate automatic search method and system based on reinforcement learning.

Background

Deep learning is one of important leading-edge core technologies of artificial intelligence, and the application of the deep learning penetrates almost all fields of national production and life. The convolutional neural network is an important deep learning technology, however, the huge model parameters and the operation amount of the convolutional neural network are often limited by hardware computing resources, and model compression is required in actual application deployment. Pruning is one of the main means for realizing deep network model compression, and the goal is to utilize the inherent sparsity of the model to remove some unimportant parameters without obviously losing the precision of the model, thereby compressing the scale of the model. According to the granularity of the pruned object, the current pruning algorithm can be divided into two categories: unstructured pruning, which is oriented to remove individual weights or neurons, and structured pruning, which is oriented to remove entire channels or convolution kernels. The structured pruning can directly utilize general convolution and matrix multiplication operators, a specially-customized hardware base is not needed to realize reasoning acceleration or save memory in operation, and the actual application is more convenient.

The early structured pruning technology focuses on seeking a more accurate and efficient channel importance measurement method in multiple aspects, so that model compression is realized through importance ranking and preset pruning rate. However, the redundancy and sensitivity of the layers in CNNs are very different. Most pruning algorithms use empirical means to determine the pruning rate/channel retention rate of each layer in the network, and some algorithms determine the pruning rate of each layer according to the global pruning rate by normalizing the importance of each layer. These algorithms have difficulty in mining the actual redundancy of each layer of the network, thereby affecting the accuracy of the pruned network.

In recent years, some methods focus on automatically searching the pruning rate of each network layer, including network structure search, heuristic search, dichotomy, statistical modeling, reinforcement learning algorithm, and the like. The deep reinforcement learning algorithm can search the optimal strategy in the continuous action space through interaction between the intelligent agent and the environment, is very suitable for solving the problems, and the effect of a representative algorithm such as AMC and the like is obviously superior to that of the traditional method for determining the pruning rate of each layer according to experience. However, the current algorithms still show the global accuracy of the network as the main design basis of the reward function, the influence of the redundancy of each layer of the network on the accuracy is not fully considered, and a good compromise between the model compression ratio and the network accuracy cannot be realized.

Disclosure of Invention

The invention aims to solve the problem that the influence of the redundancy of each network layer on the precision is not fully considered in an algorithm for automatically searching the pruning rate of each network layer in the prior art, so that a better compromise between a model compression ratio and the network precision cannot be realized, and provides a CNN network pruning rate automatic searching method and system based on reinforcement learning.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

the invention provides a CNN network pruning rate automatic search method based on reinforcement learning, which comprises the following steps:

constructing a Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned;

searching an optimal strategy of a Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth strengthening algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;

and performing structured pruning on the pre-trained target network to be pruned according to the optimal pruning rate of each layer of the target network to be pruned, so as to realize model compression of the target network to be pruned.

Preferably, the method for constructing the markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned is as follows:

taking a target network M to be pruned which is pre-trained and carries out pruning as an environment, and constructing a state vector abstracted from each layer of characteristics of the target network M to be pruned as the observation of a reinforcement learning agent on the environment;

defining the channel retention rate of each layer of a target network M to be pruned as the action executed by the reinforcement learning agent to form a continuous action space;

constructing a reward function which gives consideration to the overall precision performance of the target network M to be pruned and the precision performance of each layer;

copying a target network M to be pruned as a reference network B; the target network M to be pruned and the reference network B form a twin network (M, B), and the reference network B is used for calculating the reward function.

Preferably, the reward function R (i, t) is calculated as follows:

R(i，t)＝λ·r _L (i，t)+(1-λ)·R _N (i)-α·||r _L (i)|| ₂

wherein r is _L (i, t) is that the action a is executed on the t-th layer of the target network M to be pruned in the i-th epadiode _t Later earned tier rewards; r _N (i) Performing a round of layer-by-layer pruning on the target network M to be pruned, and calculating the network-level reward according to the measured precision; λ is a super parameter for the weights of the control level reward and the network level reward respectively, | | r _L (i)|| ₂ Is a punishment item for restraining the problem of the post-disadvantage brought by the hierarchy reward; alpha is a hyper-parameter used to control the penalty term weight.

Preferably, the hierarchical reward function r _L The calculation method of (i, t) is as follows:

wherein D (x) represents the temporary network M under the same input image x ^t The Euclidean distance between the t +1 th layer output characteristic diagram of the reference network B and the t +1 th layer output characteristic diagram of the reference network B;

the euclidean distance D (x) is calculated as follows:

wherein the content of the first and second substances,

representing the output characteristic diagram of the t +1 th layer in the reference network B after the input image x,

a parameter tensor representing the reference network B;

representing post-input image x temporary network M ^t The output characteristic diagram of the middle t +1 layer,

represents M ^t The parameter tensor of (2).

Preferably, the output feature map of the t +1 th layer in the reference network B after the input image x

The calculation method of (2) is as follows:

wherein the content of the first and second substances,

represents a sub-network in the target network that contains all the computing units between layers t and t + 1;

using M ^t Computing temporary network M after input image x by output characteristic diagram of t-1 layer ^t Output characteristic diagram of middle t +1 layer

The method comprises the following steps:

wherein the operator

Representing element-level multiplication on each lane of the feature map.

Preferably, the column vector r formed by the layer-level rewards in the ith epamode _L (i) Is defined as follows:

r _L (i)＝[r _L (i，1)，r _L (i，2)，…，r _L (i，T)]′

wherein L is utilized ₂ The norm homogenizes the tier reward values for each tier.

Preferably, the depth-enhanced algorithm is a dual-delay depth deterministic policy gradient method TD3.

The invention provides a CNN network pruning rate automatic search system based on reinforcement learning, which comprises the following steps:

the model building module is used for building a Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned;

the optimal strategy searching module is used for searching the optimal strategy of the Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth strengthening algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;

and the CNN network pruning module is used for performing structured pruning on the pre-trained target network to be pruned according to the optimal pruning rate of each layer of the target network to be pruned, so as to realize model compression of the target network to be pruned.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the reinforcement learning-based CNN network pruning rate automatic search method when the computer program is executed.

A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of a reinforcement learning-based CNN network pruning rate automatic search method.

Compared with the prior art, the invention has the following beneficial effects:

aiming at the model compression task of a target network to be pruned, the CNN network pruning rate automatic search method based on reinforcement learning provided by the invention has the advantages that compared with the traditional method for artificially determining the pruning rate of each layer according to experience, the layer pruning rate automatic search technology based on deep reinforcement learning has higher automation degree of network pruning, can dig out the potential inherent sparsity characteristic of each layer of the network, and can better balance the effects of model precision and model compression ratio on the premise of determining the overall pruning rate. The deep reinforcement learning agent with more excellent performance is fully utilized, and the problems of over-estimation of the compression ratio and unstable parameter updating existing in the methods are solved.

Further, aiming at the problem of network precision reduction caused by coarse-grained 'multi-step same reward' in the prior similar technology, the designed fine-grained hierarchical reward function integrates network-level feedback and hierarchical feedback factors after each layer of pruning action to obtain real single-step independent reward, so that an intelligent body can timely obtain feedback from the environment in the whole pruning strategy searching process, the characteristics of reinforcement learning are better met, and the intelligent body is favorable for learning a better pruning strategy under the same conditions.

Furthermore, each layer of pruning action has an independent reward value, so that the intelligent agent can fully mine the hidden sparse sensitivity of each layer of the network, and a better pruning rate strategy is searched.

The invention provides a CNN network pruning rate automatic search system based on reinforcement learning, which is characterized in that the system is divided into a model construction module, a model solving module and a model pruning module, and the modules are mutually independent by adopting a modularization idea, so that the modules can be conveniently and uniformly managed.

Drawings

In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flow chart of the reinforced learning-based CNN network pruning rate automatic search method of the present invention.

Fig. 2 is a general frame diagram of the network pruning rate reinforcement learning strategy search method according to the present invention.

FIG. 3 is a schematic diagram of the operation of the method of the present invention on a time axis.

FIG. 4 is a schematic diagram of the calculation of a hierarchical reward function using a twin network according to the present invention.

FIG. 5 is a visual diagram of sparsity of each layer of the VGG-16 network obtained in a verification experiment.

FIG. 6 shows a comparison of the effectiveness of agents TD3 and DDPG according to the present invention in a validation experiment.

Fig. 7 is a diagram of an enhanced learning-based CNN network pruning rate automatic search system according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the embodiments of the present invention, it should be noted that if the terms "upper", "lower", "horizontal", "inner", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which is usually arranged when the product of the present invention is used, the description is merely for convenience and simplicity, and the indication or suggestion that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, cannot be understood as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.

Furthermore, the term "horizontal", if present, does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood according to specific situations by those of ordinary skill in the art.

The invention is described in further detail below with reference to the accompanying drawings:

example one

The invention provides a CNN network pruning rate automatic search method based on reinforcement learning, which comprises the following steps as shown in figure 1:

s1, constructing a Markov decision process model according to the pruning rate search problem of each layer of a target network to be pruned;

the method for constructing the Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned is as follows:

defining the channel retention rate of each layer of the target network M to be pruned as the action executed by the reinforcement learning agent to form a continuous action space;

The reward function R (i, t) is calculated as follows:

R(i，t)＝λ·r _L (i，t)+(1-λ)·R _N (i)-α·||r _L (i)|| ₂

wherein r is _L (i, t) is that the action a is executed on the t-th layer of the target network M to be pruned in the i-th epadiode _t Later earned tier rewards; r _N (i) Performing a round of layer-by-layer pruning on the target network M to be pruned, and calculating the network-level reward according to the measured precision; lambda is a super-parameter of the weight occupied by the control level reward and the network level reward respectively, | | r _L (i)|| ₂ Is a punishment item for restraining the inferior problem caused by the level reward; alpha is a hyper-parameter for controlling the weight of the penalty term.

Hierarchical reward function r _L The calculation method of (i, t) is as follows:

the Euclidean distance D (x) is calculated as follows:

wherein the content of the first and second substances,

to representThe output characteristic diagram of the t +1 th layer in the reference network B after the image x is input,

a parameter tensor representing the reference network B;

represents M ^t The parameter tensor of (2).

Output characteristic diagram of t +1 th layer in reference network B after inputting image x

The calculation method of (2) is as follows:

wherein the content of the first and second substances,

representing a sub-network in the target network containing all computing units between layers t and t + 1;

by means of M ^t Computing temporary network M after input image x by output characteristic diagram of t-1 layer ^t Output characteristic diagram of middle t +1 layer

The method comprises the following steps:

wherein the operator

Representing element-level multiplication on each lane of the feature map.

Column vector r composed of layer-level rewards in ith epamode _L (i) Is defined as follows:

r _L (i)＝[r _L (i，1)，r _L (i，2)，…，r _L (i，T)]′

S2, searching an optimal strategy of the Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth enhancement algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;

the depth strengthening algorithm is a double-delay depth certainty strategy gradient method TD3.

And S3, according to the optimal pruning rate of each layer of the target network to be pruned, performing structured pruning on the pre-trained target network to be pruned, and realizing model compression of the target network to be pruned.

The specific process of constructing the twin network architecture (M, B) is as follows:

wherein M is a target network to be pruned, and a temporary network obtained by performing single-layer pruning on the t-th layer is recorded as M ^t M's sub-network comprising all the computing units between layers t and t +1 is denoted as

All pruning behaviors in the process of searching the pruning rate of each layer are pseudo-pruning, namely Mask _t According to the action a _t And a temporary mask matrix generated by the channel importance ranking result for evaluating the pruning action a _t Impact on network accuracy. And after the reinforcement learning algorithm searches the optimal pruning rate of each layer, performing substantial pruning on the network.

And copying the original M network before pruning to obtain a reference network B, wherein the reference network is kept unchanged in the execution process of the pruning rate search algorithm and is used as a reference for measuring the degree of network precision influenced by the pruning behaviors of M layers.

As shown in FIG. 2, the convolutional neural network pruning rate self based on deep reinforcement learning proposed by the present inventionA dynamic search method and a system overall architecture diagram. The method comprises three main parts, namely a twin network architecture consisting of a reference network B and a target network M to be pruned, an agent consisting of a TD3 deep reinforcement learning algorithm, a reward function for reinforcement learning feedback and the like. Taking the T-layer target network M as the environment of reinforcement learning, and taking the channel retention rate of each layer as the action a of reinforcement learning _t And automatically searching the optimal pruning rate of each layer of the network by adopting a TD3 algorithm.

A complete round of process from the layer 1 sequential search to the layer T of the target network M is defined as an epamode of the reinforcement learning agent training. In an epicode, firstly, for each layer t of the target network M, the agent observes the environment to obtain a state vector s _t (ii) a Then, calculating the corresponding action a of the current layer by using an Actor network in the TD3 algorithm _t (ii) a And calculating the hierarchy reward r of the current layer by comparing the reference network B with the target network M _L (i, t) as indicated by the thick solid arrow in FIG. 2; then moving to the next layer to repeat the operation, and after the layer-by-layer search of the T layer, completing the pruning rate search and the pseudo-pruning operation of each layer of the whole network to obtain a target network M (i) after pseudo-pruning; calculating the image classification accuracy of the target network M (i) based on the pseudo-pruned target network and the verification image set, and using the image classification accuracy as a network-level reward R _N (i) As indicated by the dashed arrows in fig. 2; awarding the hierarchy r _L (i, t) and network level rewards R _N (i) Obtaining total rewards R (i, t) of each layer of action through weighted combination, storing the total rewards R (i, t) together with the state vector and the action into an experience playback pool, updating the TD3 intelligent network through the sampling of the playback pool, and obtaining the pruning rate strategy pi updated by the epsilon _i (s _t )。

As shown in fig. 3, the operation of the present invention in the time dimension is shown. Each epicode receives a network-level reward R _N (i) Wherein each layer of pruning rate action is executed to obtain an independent hierarchy reward r _L (i, t) which combine to form a total reward R (i, t) associated with the corresponding action a _t Corresponding state vector s _t And storing the experience playback pool, and updating the intelligent agent network once after each epsilon is finished. Enough pass throughTraining the agent through multiple rounds of epicode circulation, and obtaining a final pruning strategy pi(s) when the strategy is not obviously changed any more _t )。

Layer t state vector s of target network M observed by agent in FIG. 2 _t The concrete form is as follows:

s _t ＝(t，n，c，h，w，stride，k，FLOPs[t]，reduced，rest，a _t-1 )

wherein T ∈ {1, \8230;, T } is the layer index of the T-layer CNN network, n and c are the output and input channel numbers of the T-th layer, respectively, the height and width of the input characteristic image of the layer are recorded as h and w, stride is the moving step length of the convolution kernel of the layer, k is the side length of the convolution kernel, FLOPs [ T [ ]]Representing the floating point calculation amount of the t layer, reduced representing the floating point calculation amount reduced by all pseudo pruning operations before the t layer, rest representing the residual floating point calculation amount of the subsequent layer, a _t-1 Indicating the agent action performed by layer t-1, i.e. the channel reservation rate of the previous layer.

Action a in FIG. 2 _t Means the channel reservation rate, a, of the t-th layer of the target network M _t E (0, 1), namely the value range of the action is a real number between 0 and 1. The Actor network of the TD3 agent follows the observed s _t Calculate a _t The action is calibrated before actual pruning is executed, and the action is ensured to be in accordance with the preset upper and lower limits of pruning rate of each layer and the preset integral FLOP of the network _s A limiting condition; after obtaining the final pruning strategy pi(s) _t ) Then, the channel importance metric criteria (e.g., L of the convolution kernel) will be relied upon ₁ Norm) to remove the channel in the later sequence, and completing the network pruning.

In consideration of the continuity of the state space and the continuity of the motion space in the problem, the agent module in fig. 2 adopts a TD3 deep reinforcement learning algorithm.

The specific implementation details of the tier rewards represented by the heavy solid arrows in fig. 2 are:

the output characteristic diagram of the t +1 th layer of the reference network B can be represented by using the output characteristic diagram of the t-th layer thereof. Defining the t-th layer output characteristic diagram after the reference network B inputs the image x as

Wherein

A parameter tensor representing the reference network. Defining a sub-network in the target network containing all computing units between layers t and t +1 as

The target network that does not perform pruning has the same output as the output profile of the t +1 th layer of the reference network:

principle of calculating the hierarchy reward by using twin network (M, B) comparison As shown in FIG. 4, a pruning action a is performed on the t-th layer of the target network _t Then, the number of output characteristic image channels changes, and the output characteristic image channels further pass through

T +1 layer output characteristic image obtained after operation

Feature map of output from reference network

Compared with the same channel number, the distance between the two groups of feature maps can be compared.

In the strategy search stage, a pseudo pruning mode is adopted, namely according to the action a _t Generating Mask matrix Mask by sorting sum channel importance _t And the method acts on a convolution kernel to measure the influence of the channel pruning of the current layer on the network precision. For the same input image x, directly using the output feature map of the t +1 th layer of the reference network as the input of the t th layer after the pruning of the target network when calculating the hierarchy reward, so that the direct calculation can be realized through (1)

Wherein the operator

Representing element-level multiplications acting on a feature graph. Inputting an image x, pruning the t-th layer of the target network M to obtain a t + 1-th layer output characteristic diagram, calculating the Euclidean distance between the t + 1-th layer output characteristic diagram and the t + 1-th layer output characteristic diagram of the reference network B, and expressing the Euclidean distance by a formula as follows:

thereby obtaining the single-step action a executed in the t-th layer of the network _t Late tier prize r _L (i, t) is:

in fig. 2, the hierarchy awards and the network-level awards are merged, and the proposed overall award function is:

R(i，t)＝λ·r _L (i，t)+(1-λ)·R _N (i)-α·||r _L (i)|| ₂ (5)

wherein r is _L (i)＝[r _L (i，1)，r _L (i，2)，…，r _L (i，T)]Is a vector formed by awards of each layer level in the ith epamode, | | · | | luminous flux ₂ Represents L ₂ Norm, which is used for relieving the problem of later disadvantage caused by hierarchy reward, namely the network is forced to select high pruning ratio at the later layer; λ is a hyper-parameter controlling two bonus proportions; alpha is a hyper-parameter that controls the weight of the penalty term. According to the cross validation test, choosing λ =0.2, α =0.2 will result in better performance.

The invention is in useAfter searching and learning of a plurality of epsicodes are executed, an optimal pruning strategy of each layer under the condition of a preset global pruning rate is obtained; according to the strategy and the importance degree of each layer of channels, sorting (based on L is adopted) ₁ Importance measure of the criteria), perform substantive channel pruning on each layer of the target network and save the network structure thereof; and finishing the whole process of network pruning after fine adjustment.

Verification test

The effect comparison of the invention and a plurality of existing network pruning algorithms is carried out, including L ₁ -norm, network-scaling, SFP, CACP, AMC, LPF, HRank, GAL, variational CNN planning, hinge, FPGM and AOFP. A natural scene image classification data set CIFAR-10 and a remote sensing image target classification data set UC-Merceded Land-Use are used in the verification experiment, and target networks to be pruned are a convolutional neural network VGG-16 and a lightweight network MobileNet-V1.

TABLE 1

Table 1 shows the comparison of the performance of the network obtained by the present invention and the traditional pruning algorithm for the VGG-16 network on the CIFAR-10 dataset. The coarsening values in the table are the best results at this retention. It can be seen that under four preset global compression rate indexes of {30%,50%,60%,70% }, the pruned network obtained by the invention has the best Top1 classification accuracy, and when the compression ratio is higher than 50%, the network precision obtained by pruning is higher than that of a reference network (original pruned network), which indicates that the invention can effectively eliminate the over-fitting problem in the high-redundancy network.

TABLE 2

Table 2 shows a comparison of the present invention with another network pruning algorithm AMC, which also has pruning rate auto-search capability. Pruning is carried out on the UC-Merced Land-Use data set aiming at the lightweight network MobileNet-V1, the size of all images is adjusted to be 224x224, and the bold numerical value in the table is the best result under the global retention rate. It can be seen that the network precision of the invention is superior to AMC algorithm under {10%,30%,70% } three FLOPs retention rates, and the application potential of the invention is demonstrated after deployment of compression on hardware resource-limited equipment.

As shown in FIGS. 5 to 6, the demonstrated validation experiments are the comparison results of the performance of the VGG-16 network pruning on the CIFAR-10 data set.

Fig. 5 shows the pruning rate of each layer automatically searched by the present invention for different global retention rates. It can be seen that the channel retention rates of the layers near the front end of the network are higher, the channel retention rates of the layers near the rear end are lower, and the overall retention rates show approximately the same trend. The experimental result reveals the inherent sparsity and redundancy difference of each layer of the CNN network, is consistent with the research result of each layer of sparsity sensitivity of VGG-16 in the previous work, and verifies the correctness of the invention.

FIG. 6 illustrates the effect of selecting different deep reinforcement learning agents on pruning rate search results. It can be seen that under all conditions of the preset global pruning rate, the TD3 intelligent agent can obtain higher network precision after pruning compared with the DDPG intelligent agent, and the advantage of the TD3 intelligent agent adopted in the invention is verified.

Example two

The system of the CNN network pruning rate automatic search method based on reinforcement learning, as shown in FIG. 7, includes:

EXAMPLE III

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method for automatic search of network pruning strategies for obtaining a hierarchical reward using a twin network when executing the computer program, wherein the memory may comprise a memory, such as a high speed random access memory, and may further comprise a non-volatile memory, such as at least one disk memory or the like; the processor, the network interface and the memory are connected with each other through an internal bus, wherein the internal bus can be an industrial standard system structure bus, a peripheral component interconnection standard bus, an extended industrial standard structure bus and the like, and the bus can be divided into an address bus, a data bus, a control bus and the like. The memory is used for storing programs, and particularly, the programs can comprise program codes which comprise computer operation instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

Example four

A computer readable storage medium storing a computer program which when executed by a processor implements the steps of the method for automatic search of network pruning policies for obtaining tier rewards using twin networks, in particular, but not exclusively, volatile memory and/or non-volatile memory, for example. The volatile memory may include Random Access Memory (RAM) and/or cache memory (cache), among others. The non-volatile memory may include a Read Only Memory (ROM), a hard disk, a flash memory, an optical disk, a magnetic disk, and the like.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A CNN network pruning rate automatic search method based on reinforcement learning is characterized by comprising the following steps:

searching an optimal strategy of a Markov decision process model based on the overall pruning rate of the target network to be pruned and a depth enhancement algorithm to obtain the optimal pruning rate of each layer of the target network to be pruned;

2. The CNN network pruning rate automatic search method based on reinforcement learning of claim 1, wherein the method for constructing the Markov decision process model according to the pruning rate search problem of each layer of the target network to be pruned is as follows:

3. The reinforced learning-based CNN network pruning rate automatic search method according to claim 1, wherein the reward function R (i, t) is calculated as follows:

R(i，t)＝λ·r _L (i，t)+(1-λ)·R _N (i)-α·||r _L (i)|| ₂

wherein r is _L (i, t) is that the action a is executed on the t-th layer of the target network M to be pruned in the i-th epadiode _t Later earned tier rewards; r _N (i) Performing a round of layer-by-layer pruning on the target network M to be pruned, and calculating the network-level reward according to the measured precision; lambda is a super-parameter of the weight occupied by the control level reward and the network level reward respectively, | | r _L (i)|| ₂ Is a punishment item for restraining the inferior problem caused by the level reward; alpha is a hyper-parameter used to control the penalty term weight.

4. The CNN network pruning rate automatic search method based on reinforcement learning of claim 3, wherein the hierarchical reward function r is _L The calculation method of (i, t) is as follows:

the euclidean distance D (x) is calculated as follows:

wherein the content of the first and second substances,

represents the output characteristic diagram of the t +1 th layer in the reference network B after the input image x,

a parameter tensor representing the reference network B;

represents M ^t The parameter tensor of (2).

5. The CNN network pruning rate automatic search method based on reinforcement learning of claim 4, wherein the output feature map of the t +1 th layer in the reference network B after the input image x

The calculation method of (2) is as follows:

wherein the content of the first and second substances,

using M ^t Computing a post-input image x temporary network M by using an output feature map of a t-1 layer ^t Output characteristic diagram of middle t +1 layer

The method comprises the following steps:

wherein the operator

Representing element-level multiplication on each lane of the feature map.

6. The CNN network pruning rate automatic search method based on reinforcement learning of claim 3, wherein the column vector r formed by the layer level rewards in the ith epamode _L (i) Is defined as follows:

r _L (i)＝[r _L (i，1)，r _L (i，2)，…，r _L (i，T)]′

7. The reinforced learning-based CNN network pruning rate automatic search method according to claim 1, characterized in that the depth reinforcement algorithm is a double-delay depth deterministic policy gradient method TD3.

8. The CNN network pruning rate automatic search system based on reinforcement learning according to any one of claims 1 to 7, characterized by comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the reinforcement learning-based CNN network pruning rate automatic search method according to any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the steps of the reinforcement learning-based CNN network pruning rate automatic search method according to any one of claims 1 to 7.