CN115600650A - Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium - Google Patents

Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium Download PDF

Info

Publication number
CN115600650A
CN115600650A CN202211363959.2A CN202211363959A CN115600650A CN 115600650 A CN115600650 A CN 115600650A CN 202211363959 A CN202211363959 A CN 202211363959A CN 115600650 A CN115600650 A CN 115600650A
Authority
CN
China
Prior art keywords
pruning
filter
neural network
layer
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211363959.2A
Other languages
Chinese (zh)
Inventor
张维纬
纪铭
余浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202211363959.2A priority Critical patent/CN115600650A/en
Publication of CN115600650A publication Critical patent/CN115600650A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an automatic convolutional neural network quantitative pruning method based on reinforcement learning, which comprises the steps of firstly obtaining a data set of an image, pre-training the image by using an initialized model to obtain an average rank of a characteristic diagram output by each filter, and sequencing the average rank by combining the global importance of the filters to obtain filter importance information; and realizing automatic neural network model quantization and pruning operation through reinforcement learning, obtaining a neural network model compression strategy with the highest model precision, and obtaining the final neural network model after pruning is completed. The invention carries out global sequencing on the filters in the convolutional layers according to the importance degree of the influence on the model precision and the combination of the average rank, the rank is consistent with the importance degree of the filters, and higher bit numbers are distributed to the weight parameters of the filters with high importance, thereby achieving the maximum precision retention and being capable of compressing the neural network applied to a high-performance computer and then deploying the neural network on mobile edge equipment with weaker calculation and storage.

Description

Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium
Technical Field
The invention belongs to the field of image processing, and particularly relates to an automatic convolutional neural network quantitative pruning method and device based on reinforcement learning and a storage medium.
Background
In recent years, on one hand, models obtained by research of neural networks with a deeper layer number have better and better effects, and on the other hand, with continuous development and innovation of related fields such as unmanned driving and intelligent mobile equipment, the demand for deep neural network models suitable for edge equipment with weaker computing power is gradually increased. Due to the characteristics of the deep neural network, when the deep neural network is deployed on a mobile device, the quantity of parameters and the floating point calculation quantity contained in the deep neural network are extremely huge. For example, a 152-layer ResNet has 6000 or more ten thousand parameters, and when an image with a resolution of 224 × 224 is inferred, 100 or more hundred million floating point operations are required, which is difficult to operate on a platform with limited computing resources, such as a mobile device, a wearable portable device or an internet of things device. In addition, running a deep neural network model on the GPU for real-time target detection is costly. For example, running YOLO v3 on NVIDIA Tesla T4 can detect 40 frames of images per second in real time, but the device is sold in the market for nearly thirty thousand dollars, far beyond the widespread economic viability. The existing neural network model is difficult to realize the consideration of model accuracy and calculation speed on low-cost equipment.
With the development of neural network deployment applications, the high dependence on precision is gradually shifted to ensuring low memory space occupation and less floating point calculation amount as far as possible under the condition of keeping the precision not reduced or within an acceptable reduction degree range. Reinforcement learning is undoubtedly a very promising tool as a way to guide neural networks for automatic learning by agents. Most of the existing neural network quantification and pruning methods need manual experts to continuously adjust parameters to achieve the optimal compression effect, are quite large exponential hyperparametric space, are high in operation difficulty and time-consuming, and are easy to generate the local optimal or suboptimal situation from the practical effect. The existing model pruning method focuses on weight pruning in a model, a fine-grained pruning mode needs specific hardware support, and generalization is poor. Meanwhile, most quantization and pruning strategies are rule-based heuristic methods, and the way is likely to result in suboptimal compression effect.
Disclosure of Invention
The invention aims to provide an automatic convolutional neural network quantitative pruning method, equipment and a storage medium based on reinforcement learning, which can reduce the parameter bit width, the parameter quantity and the floating point calculation quantity of a model through the automatic quantitative pruning process on the premise of small-range precision loss, and can compress the neural network originally applied to a high-performance computer and then deploy the neural network on mobile edge equipment with weaker calculation and storage capacities.
The invention relates to an automatic convolutional neural network quantitative pruning method based on reinforcement learning, which comprises the following steps of;
s10, acquiring a data set of an image, and dividing the data set into a training set and a verification set according to a proportion;
s20, acquiring a convolutional neural network model to be quantitatively pruned, executing initialization, performing 60 rounds of pre-training on images in a training set by using the initialized model to obtain the average rank of the characteristic diagram output by each filter, and sequencing the average rank of the characteristic diagram output by each filter in the convolutional layer of the current model in combination with the global importance of the filter to obtain filter importance information;
step S30, realizing automatic neural network model quantization and pruning operation through reinforcement learning, and obtaining a neural network model compression strategy with the highest model precision, wherein the reinforcement learning agent is an actor-critic network, the actor network consists of actor networks A and B, the critic network is responsible for evaluating the model compression strategy, and a parameter updating method of the actor network and the critic network is a DDPG depth certainty strategy reinforcement learning method;
and S40, performing fine adjustment operation on the neural network model after the quantitative pruning is completed, and obtaining the final neural network model after the pruning is completed.
The step S20 specifically includes:
s21, creating a convolutional neural network model of pruning to be quantified based on a Pythrch frame;
s22, setting pre-training parameters and establishing a layer structure index corresponding to the convolutional neural network model;
s23, performing 60 rounds of pre-training on the images in the training set to obtain the precision of the current model, wherein the model precision is obtained by testing different images in the verification set;
step S24, obtaining the rank of each filter output characteristic diagram in the convolutional layer, and dividing the sum of the ranks of all the filter output characteristic diagrams by the number of the convolutional layer filters to obtain the average rank of the convolutional layer; sorting is carried out by combining the global importance of the filter, and the sorting result is used as the basis for judging the importance of the filter;
and sequencing the average rank of each filter output characteristic diagram in the convolutional layer of the current model in combination with the global importance of the filter to obtain the importance information of the filter.
The filter importance information in step S24 refers to that the filter in each convolutional layer uses the average of the filter importance of the current layer as a threshold, and is an importance filter if the average is greater than the threshold, otherwise is a non-importance filter, all convolutional layers are sorted according to the average of the filter importance, and the convolutional layer with the preset value before sorting is a sensitive layer and is stored in the sensitive layer index of the corresponding convolutional neural network model.
The step S30 specifically includes:
s31, acquiring a target quantization rate and a pruning rate which are set in the hyper-parameter, determining the number of filters needing pruning according to the target pruning rate, and measuring the bit number of the reserved filter parameters according to the quantization rate;
step S32, according to the deterministic strategy of reinforcement learning, if the current layer is in the sensitive layer index, setting the quantization bit number of the current layer output by the deterministic of reinforcement learning to be 8 bits, setting the compression rate to be 0.1, obtaining the number of filters of the current layer of the model, which should execute pruning operation, and simultaneously calculating to obtain the number of the filters to be pruned;
s33, sorting the filters in the layer from low to high by calculating the average rank of the filters and combining the global importance of the filters;
step S34, according to the number of the filters required for pruning of each convolutional layer obtained in the step S31, the actor network A makes different filter pruning strategies according to the judgment whether the current convolutional layer is a sensitive layer or not and the importance degree of the filter of the convolutional layer, and sets the weight of the filter for pruning to be 0; the actor network B distributes different bit widths according to different importance of the filter, the sensitive layer filter distributes higher bit width, and the strategies of the actor networks A and B are gathered to the actor network;
step S35, repeatedly executing the step S32 to the step S34 until the quantization pruning operation on all layers of the model is completed, namely, a complete round of quantization pruning operation is completed, the model precision of the round of quantization pruning operation is completed through verification set verification, and the current model compression strategy and precision are stored in a reinforcement learning experience playback pool;
and S36, repeatedly executing the steps S32 to S35, finishing all rounds of model quantitative pruning operation, and obtaining the neural network model compression strategy with the highest precision.
The global importance estimation formula of the filter in the step S33:
Figure BDA0003923067730000031
where l (i) represents the layer index of the ith filter, | | · caly |, the filter 2 Denotes the L2 norm, w i Represents the weight of the ith filter, alpha, k ∈ R L L represents the total number of layers, a trainable variable;
Figure BDA0003923067730000032
r l scaling factor, R, representing rank of layer l l Represents the average rank of the l-th layer, R min Represents the minimum value of the average rank of all convolutional layers in the neural network, R max Represents the maximum value of the average rank of all convolutional layers in the neural network.
The parameter updating method in step S30 is a DDPG depth deterministic strategy reinforcement learning method, and training of the actor network and the critic network is to perform training operation by using the precision obtained after a complete round of quantitative pruning and the quantitative pruning state in each round as input, wherein in a state space, 11 shown by the following formula is used for each layer of network t in the state spaceAn attribute to represent the state S t The attribute characteristics of (2):
(t,n,c,h,w,stride,k,FLOPs[t],Re all ,Rest,i w/a ,a t-1 )
wherein t represents the label of each layer network, n represents the total number of layers of the network, c represents the number of convolution channels, h and w respectively represent the height and width of a convolution characteristic diagram, stride is the step length, k represents the iteration number, FLOPs [ t]Representing the amount of floating-point computation, re, of each layer network t all Rest is the remaining state, i, for all state responses w/a Number of quantization bits representing weight and activation, a t-1 Actions for a t-1 layer network;
the agent obtains the state S of the t-th layer in which the agent is positioned from the environment of filter quantization pruning t Obtaining the current feature vector
Figure BDA0003923067730000041
Then outputs S t Action in the State a t As quantization strategy and compression ratio of the current layer, the current layer is guided to carry out quantization bit width selection and alternative filter pruning, wherein
Figure BDA0003923067730000042
N denotes noise, theta denotes all parameters of the actor network, pi θ (x) As a function of pruning rate;
in the next round of quantitative pruning, the current target Q value y is calculated according to the following formula by collecting m samples in a DDPG experience playback pool j
Figure BDA0003923067730000043
Wherein the content of the first and second substances,
Figure BDA0003923067730000044
is obtained through a network of actor targets, and
Figure BDA0003923067730000045
then it is obtained through the critic target network, and the value of gamma is setSet to 1 to avoid over-priority short term rewards, enabling the agent to take long term rewards into account by reducing the variance of the gradient estimate during agent updates by subtracting the baseline reward b, R j For the current short-term reward quantifying pruning operations, the value of the gradient estimate is an exponential moving average of this previous reward;
the loss function of DDPG is the mean square error function as shown below:
Figure BDA0003923067730000046
A=π θ (S)+N
wherein
Figure BDA0003923067730000047
Is the current state S j The selected action A interacting with the environment of the obtained feature vector increases certain noise N, and the noise is exponentially attenuated after each round of pruning is finished;
the reward function in reinforcement learning is shown by the following formula:
R FLOPs =-Error·log(FLOPs)。
a computer device comprising a memory storing a computer program and a processor implementing any one of the above-described reinforcement learning-based automated convolutional neural network quantitative pruning methods when the computer program is executed.
A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described automated convolutional neural network quantization pruning methods based on reinforcement learning.
The convolutional neural network quantitative pruning method, equipment and storage medium based on reinforcement learning have better effect than the traditional importance ranking method on the basis of combining the global ranking of average rank, and have the following advantages:
1. the pruning operation is quantified automatically. The quantitative pruning method realized by the reinforcement learning mode is an automatic compression process, and meanwhile, in the quantization and pruning processes, the quantization and pruning strategies can be continuously optimized by the learning mode, so that the optimal compression strategy is finally obtained;
2. the accuracy of the neural network model after compression is greatly ensured. The invention focuses on the filter in the convolutional layer to carry out global sequencing according to the importance degree influencing the model precision and the combination of the average rank, the rank and the filter importance degree have consistency and are combined with importance sequencing pruning, and meanwhile, higher bit digits are distributed to the filter weight parameters with high importance, so that the maximum precision retention is achieved.
3. The intelligent cost of the equipment is reduced. The invention adopts quantization operation while pruning the filter, namely selecting the bit number expressed by the parameter of the filter, and combines pruning and quantization operation, even if the calculation complexity is much higher than that of single pruning operation, a lighter network can still be obtained. Because the price of the high-performance mobile edge device is very expensive, on the premise that the guaranteed precision is slightly reduced, the model after the quantitative pruning operation is directly deployed on the edge device with weak computing and storing capabilities, and the cost of the corresponding computing device can be greatly reduced.
Drawings
Fig. 1 is a schematic diagram of the framework of the present invention.
The invention is described in further detail below with reference to the figures and specific examples.
Detailed Description
As shown in FIG. 1, the automatic convolutional neural network quantitative pruning method based on reinforcement learning of the present invention comprises the following steps;
s10, acquiring a data set of an image, and dividing the data set into a training set and a verification set according to a proportion;
s20, acquiring a convolutional neural network model to be quantitatively pruned, executing initialization, performing 60 rounds of pre-training on images in a training set by using the initialized model to obtain the average rank of the characteristic diagram output by each filter, and sequencing the average rank of the characteristic diagram output by each filter in the convolutional layer of the current model in combination with the global importance of the filter to obtain filter importance information; the method specifically comprises the following steps:
s21, creating a convolutional neural network model of pruning to be quantified based on a Pythrch frame;
s22, setting pre-training parameters and establishing a layer structure index corresponding to the convolutional neural network model;
s23, performing 60 rounds of pre-training on the images in the training set to obtain the precision of the current model, wherein the model precision is obtained by testing different images in the verification set;
step S24, obtaining the rank of each filter output characteristic diagram in the convolutional layer, and dividing the sum of the ranks of all the filter output characteristic diagrams by the number of the convolutional layer filters to obtain the average rank of the convolutional layer; and sequencing by combining the global importance of the filter, and taking the sequencing result as a basis for judging the importance of the filter, wherein the average rank calculation formula comprises the following steps:
Figure BDA0003923067730000061
wherein o is l(i) The output characteristic diagram of the ith filter of the l layer is shown, rank (x) represents a function of the calculation Rank, and n represents the number of the convolution layer filters;
and (3) sequencing the average rank of each filter output characteristic diagram in the convolutional layer of the current model in combination with the global importance of the filter to obtain filter importance information: the filter in each convolutional layer takes the average number of the importance of the filter in the current layer as a threshold value, the filter is an importance filter if the average number is larger than the threshold value, otherwise, the filter is a non-importance filter, all convolutional layers are sorted according to the average number of the importance of the filter, the convolutional layers in the top 10 percent of the sorting are sensitive layers, and the sensitive layer indexes corresponding to the convolutional neural network model are stored;
step S30, realizing automatic neural network model quantization and pruning operation through reinforcement learning, and obtaining a neural network model compression strategy with the highest model precision, wherein a reinforcement learning agent is an actor-critic network, the actor network consists of actor networks A and B, the critic network is responsible for evaluating the model compression strategy, and a parameter updating method of the actor network and the critic network is a DDPG depth certainty strategy reinforcement learning method; the method specifically comprises the following steps:
s31, acquiring a target quantization rate and a pruning rate which are set in the hyper-parameter, determining the number of filters needing pruning according to the target pruning rate, and measuring the bit number of the reserved filter parameters according to the quantization rate; step S32, according to the deterministic strategy of reinforcement learning, if the current layer is in the sensitive layer index, setting the quantization bit number of the current layer output by the deterministic of reinforcement learning to be 8 bits, setting the compression rate to be 0.1, obtaining the number of filters of the current layer of the model, which should execute pruning operation, and simultaneously calculating to obtain the number of the filters to be pruned;
step S33, sorting the filters in the layer from low to high by calculating the average rank of the filters and combining the global importance of the filters, wherein the global importance estimation formula of the filters is as follows:
Figure BDA0003923067730000071
wherein l (i) represents the layer index of the ith filter, | · caly |, L 2 Denotes the L2 norm, w i Represents the weight of the ith filter, alpha, k ∈ R L For trainable variables, L represents the total number of layers,
Figure BDA0003923067730000072
r l scaling factor, R, representing rank of layer l l Represents the average rank of the l-th layer, R min Represents the minimum value of the average rank of all convolutional layers in the neural network, R max Representing the maximum value of the average rank of all convolutional layers in the neural network;
step S34, according to the number of the filters required to be pruned of each convolutional layer obtained in the step S31, the actor network A makes different filter pruning strategies according to the judgment whether the current convolutional layer is a sensitive layer or not and the importance degree of the filter of the convolutional layer, and sets the weight of the filter for pruning to 0; the actor network B distributes different bit widths according to different importance of the filter, the sensitive layer filter distributes higher bit width, and the strategies of the actor networks A and B are gathered to the actor network;
step S35, repeatedly executing the step S32 to the step S34 until the quantization pruning operation on all layers of the model is completed, namely, a complete round of quantization pruning operation is completed, the model precision of the round of quantization pruning operation is completed through verification set verification, and the current model compression strategy and precision are stored in a reinforcement learning experience playback pool;
and S36, repeatedly executing the steps S32 to S35, finishing all rounds of model quantitative pruning operation, and obtaining the neural network model compression strategy with the highest precision.
Further, the parameter updating method of the above steps is a DDPG depth deterministic strategy reinforcement learning method, wherein training of the actor network and the critic network is performed by taking the precision obtained after a complete round of quantitative pruning and the quantized pruning state in each round as input, wherein in a state space, for each layer network t, a state S is represented by 11 attributes shown in the following formula t The attribute characteristics of (2):
(t,n,c,h,w,stride,k,FLOPs[t],Re all ,Rest,i w/a ,a t-1 )
wherein t represents the label of each layer network, n represents the total number of layers of the network, c represents the number of convolution channels, h and w respectively represent the height and width of a convolution characteristic diagram, stride is the step length, k represents the iteration number, FLOPs [ t]Represents the floating point calculation quantity, re, of each layer network t all Rest is the remaining state, i, for all state responses w/a Number of quantization bits representing weight and activation, a t-1 Actions for a t-1 layer network;
the agent obtains the state S of the t-th layer in which the agent is positioned from the environment of filter quantization pruning t Obtaining the current feature vector
Figure BDA0003923067730000081
Then outputs S t Action a in the State t As the quantization strategy and compression ratio of the current layer, the current layer is guided to carry out quantization bit width selection and quantization bit width selectionSubstitute filter pruning, wherein
Figure BDA0003923067730000082
N denotes noise, theta denotes all parameters of the actor network, pi θ (x) As a function of the clipping rate.
In the next round of quantitative pruning, the current target Q value y is calculated according to the following formula by collecting m samples in a DDPG experience playback pool j
Figure BDA0003923067730000083
Wherein the content of the first and second substances,
Figure BDA0003923067730000084
is obtained through a network of actor targets, and
Figure BDA0003923067730000085
is obtained through the critic target network, setting the value of gamma to 1 to avoid over-prioritization of short-term rewards so that the agent can take care of long-term rewards, reducing the variance of the gradient estimate by subtracting the baseline reward b during agent updates, R j For the short-term reward of the current quantified pruning operation, the value of the gradient estimate is an exponentially moving average of this previous reward.
The loss function of DDPG is the mean square error function as shown below:
Figure BDA0003923067730000086
A=π θ (S)+N
wherein
Figure BDA0003923067730000087
Is the current state S j The selected action A of the obtained feature vector interacting with the environment increases a certain noise N, and the noise is exponentially attenuated after each round of pruning is finished;
the reward function in reinforcement learning is shown by the following formula:
R FLOPs =-Error·log(FLOPs)
and S40, performing fine adjustment operation on the neural network model after the quantitative pruning is completed, and obtaining the final neural network model after the pruning is completed.
The convolutional neural network to be quantified and pruned has a plurality of convolutional layers, and the convolutional layers with the top 10% of the sequence obtained in the step S24 are sensitive layers, which means that the convolutional layers are important for the convolutional neural network, and if the convolutional layers are pruned too much or bit allocation is too low, the precision of the convolutional neural network model is greatly reduced, namely the convolutional neural network model is sensitive to quantification and pruning operations, so the convolutional neural network is called as a sensitive layer.
The pruning process is to perform layer-by-layer operation on one convolutional layer of the convolutional neural network, and a sensitive layer and a non-sensitive layer are encountered in the layer-by-layer operation process.
The filters in the convolutional neural network convolutional layer of the pruning to be quantified have different importance respectively, and are divided into an important filter and a non-important filter by taking the average number as a threshold value. The pruning strategy obtained through the actor network is that the important filters of the sensitive layer are reserved, and the non-important filters of the non-sensitive layer are pruned most.
In one embodiment, a computer device, which may be a terminal, is provided that includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an automated convolutional neural network quantitative pruning method based on reinforcement learning. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program is executed by a processor to implement the method for the quantitative pruning of the convolutional neural network based on reinforcement learning provided in any of the above embodiments, and has corresponding functions and advantages.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims (8)

1. An automatic convolution neural network quantitative pruning method based on reinforcement learning is characterized by comprising the following steps;
s10, acquiring a data set of an image, and dividing the data set into a training set and a verification set according to a proportion;
s20, acquiring a convolutional neural network model to be quantitatively pruned, initializing, performing 60 rounds of pre-training on images in a training set by using the initialized model to obtain the average rank of the characteristic diagram output by each filter, and sequencing the average rank of the characteristic diagram output by each filter in the convolutional layer of the current model in combination with the global importance of the filter to obtain filter importance information;
step S30, realizing automatic neural network model quantization and pruning operation through reinforcement learning, and obtaining a neural network model compression strategy with the highest model precision, wherein the reinforcement learning agent is an actor-critic network, the actor network consists of actor networks A and B, the critic network is responsible for evaluating the model compression strategy, and a parameter updating method of the actor network and the critic network is a DDPG depth certainty strategy reinforcement learning method;
and S40, carrying out fine adjustment operation on the neural network model after the quantified pruning is completed, and obtaining the final neural network model after the pruning is completed.
2. The method for quantitative pruning of an automatic convolutional neural network based on reinforcement learning according to claim 1, wherein the step S20 specifically comprises:
s21, creating a convolutional neural network model of pruning to be quantified based on a Pythrch frame;
s22, setting pre-training parameters and establishing a layer structure index corresponding to the convolutional neural network model;
s23, performing 60 rounds of pre-training on the images in the training set to obtain the precision of the current model, wherein the model precision is obtained by testing different images in the verification set;
step S24, obtaining the rank of each filter output characteristic diagram in the convolutional layer, and dividing the sum of the ranks of all the filter output characteristic diagrams by the number of the convolutional layer filters to obtain the average rank of the convolutional layer; sorting is carried out by combining the global importance of the filter, and the sorting result is used as the basis for judging the importance of the filter;
and sequencing the average rank of each filter output characteristic diagram in the convolutional layer of the current model in combination with the global importance of the filter to obtain the importance information of the filter.
3. The method according to claim 2, wherein the filter importance information in step S24 is that the filter in each convolutional layer uses the average of the filter importance of the current layer as a threshold, the filter with the importance higher than the threshold is an importance filter, otherwise the filter with the importance higher than the threshold is a non-importance filter, all convolutional layers are sorted according to the average of the filter importance, and the convolutional layer with the preset value before sorting is a sensitive layer, and is stored in the sensitive layer index of the corresponding convolutional neural network model.
4. The method for quantitative pruning of an automatic convolutional neural network based on reinforcement learning according to claim 1, wherein the step S30 specifically comprises:
step S31, obtaining a target quantization rate and a pruning rate which are set in the hyper-parameter, determining the number of filters needing pruning according to the target pruning rate, and measuring the bit number of the reserved filter parameters according to the quantization rate;
step S32, according to the deterministic strategy of reinforcement learning, if the current layer is in the sensitive layer index, setting the quantization bit number of the current layer output by the deterministic of reinforcement learning to be 8 bits, setting the compression rate to be 0.1, obtaining the number of filters of the current layer of the model, which should execute pruning operation, and simultaneously calculating to obtain the number of the filters to be pruned;
s33, sorting the filters in the layer from low to high by calculating the average rank of the filters and combining the global importance of the filters;
step S34, according to the number of the filters required to be pruned of each convolutional layer obtained in the step S31, the actor network A makes different filter pruning strategies according to the judgment whether the current convolutional layer is a sensitive layer or not and the importance degree of the filter of the convolutional layer, and sets the weight of the filter for pruning to 0; the actor network B distributes different bit widths according to different importance of the filter, the sensitive layer filter distributes higher bit width, and the strategies of the actor networks A and B are gathered to the actor network;
step S35, repeatedly executing the step S32 to the step S34 until the quantization pruning operation on all layers of the model is completed, namely, a complete round of quantization pruning operation is completed, the model precision of the round of quantization pruning operation is completed through verification set verification, and the current model compression strategy and precision are stored in a reinforcement learning experience playback pool;
and S36, repeatedly executing the steps S32 to S35, finishing all rounds of model quantitative pruning operation, and obtaining the neural network model compression strategy with the highest precision.
5. The method for quantitative pruning of an automatic convolutional neural network based on reinforcement learning as claimed in claim 4, wherein the global importance estimation formula of the filter in the step S33 is as follows:
Figure FDA0003923067720000021
where l (i) represents the layer index of the ith filter, | | · caly |, the filter 2 Denotes the L2 norm, w i Represents the weight of the ith filter, alpha, k ∈R L L represents the total number of layers, a trainable variable;
Figure FDA0003923067720000022
r l scaling factor, R, representing rank of layer l l Represents the average rank of the l-th layer, R min Represents the minimum value of the average rank of all convolutional layers in the neural network, R max Represents the maximum value of the average rank of all convolutional layers in the neural network.
6. The intensive learning-based automatic convolutional neural network quantitative pruning method according to claim 1, wherein the parameter updating method of step S30 is a DDPG depth deterministic strategy intensive learning method, training of actor network and critic network is to perform training operation based on the precision obtained after a complete round of quantitative pruning and the quantized pruning status in each round as input, wherein, in the status space, for each layer network t, the status S is represented by 11 attributes shown by the following formula t The attribute characteristics of (2):
(t,n,c,h,w,stride,k,FLOPs[t],Re all ,Rest,i w/a ,a t-1 )
wherein t represents the label of each layer network, n represents the total number of layers of the network, c represents the number of convolution channels, h and w respectively represent the height and width of a convolution characteristic diagram, stride is the step length, k represents the iteration number, FLOPs [ t]Representing the amount of floating-point computation, re, of each layer network t all For all state responses, rest is the remaining state, i w/a Number of quantization bits representing weight and activation, a t-1 Actions for a t-1 layer network;
the agent obtains the state S of the t-th layer in which the agent is positioned from the environment of filter quantization pruning t Obtaining the current feature vector
Figure FDA0003923067720000031
Then outputs S t Action in the State a t As the quantization strategy and the compression ratio of the current layer, the current layer is guided to carry out quantization bit width selectionSelective and alternative filter pruning, wherein
Figure FDA0003923067720000032
N denotes noise, theta denotes all parameters of the actor network, pi θ (x) As a function of pruning rate;
in the next round of quantitative pruning, the current target Q value y is calculated according to the following formula by collecting m samples in a DDPG experience playback pool j
Figure FDA0003923067720000033
Wherein the content of the first and second substances,
Figure FDA0003923067720000034
is obtained through an actor target network, and
Figure FDA0003923067720000035
is obtained through the critic target network, setting the value of gamma to 1 to avoid over-prioritization of short-term rewards so that the agent can take care of long-term rewards, reducing the variance of the gradient estimate by subtracting the baseline reward b during agent updates, R j For the current short-term reward quantifying pruning operations, the value of the gradient estimate is an exponential moving average of this previous reward;
the loss function of DDPG is the mean square error function as shown below:
Figure FDA0003923067720000036
A=π θ (S)+N
wherein
Figure FDA0003923067720000041
Is the current state S j The selected action A of the obtained feature vector interacting with the environment will add a certain noise N and the noise will be inAfter each round of pruning, the tree is exponentially attenuated;
the reward function in reinforcement learning is shown in the following formula:
R FLOPs =-Error·log(FLOPs)。
7. a computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the reinforcement learning-based automated convolutional neural network quantization pruning method of any of claims 1 to 6.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for quantitative pruning according to any one of claims 1 to 6 based on an automatic convolutional neural network for reinforcement learning.
CN202211363959.2A 2022-11-02 2022-11-02 Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium Pending CN115600650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211363959.2A CN115600650A (en) 2022-11-02 2022-11-02 Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211363959.2A CN115600650A (en) 2022-11-02 2022-11-02 Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium

Publications (1)

Publication Number Publication Date
CN115600650A true CN115600650A (en) 2023-01-13

Family

ID=84851211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211363959.2A Pending CN115600650A (en) 2022-11-02 2022-11-02 Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium

Country Status (1)

Country Link
CN (1) CN115600650A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129197A (en) * 2023-04-04 2023-05-16 中国科学院水生生物研究所 Fish classification method, system, equipment and medium based on reinforcement learning
CN116304677A (en) * 2023-01-30 2023-06-23 格兰菲智能科技有限公司 Channel pruning method and device for model, computer equipment and storage medium
CN116502698A (en) * 2023-06-29 2023-07-28 中国人民解放军国防科技大学 Network channel pruning rate self-adaptive adjustment method, device, equipment and storage medium
CN117762642A (en) * 2024-01-02 2024-03-26 广州汇思信息科技股份有限公司 Convolutional neural network model loading method, device and storage medium
CN117912484A (en) * 2024-03-20 2024-04-19 北京建筑大学 Pruning-adjustable audio separation model optimization method and device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304677A (en) * 2023-01-30 2023-06-23 格兰菲智能科技有限公司 Channel pruning method and device for model, computer equipment and storage medium
CN116129197A (en) * 2023-04-04 2023-05-16 中国科学院水生生物研究所 Fish classification method, system, equipment and medium based on reinforcement learning
CN116502698A (en) * 2023-06-29 2023-07-28 中国人民解放军国防科技大学 Network channel pruning rate self-adaptive adjustment method, device, equipment and storage medium
CN116502698B (en) * 2023-06-29 2023-08-29 中国人民解放军国防科技大学 Network channel pruning rate self-adaptive adjustment method, device, equipment and storage medium
CN117762642A (en) * 2024-01-02 2024-03-26 广州汇思信息科技股份有限公司 Convolutional neural network model loading method, device and storage medium
CN117762642B (en) * 2024-01-02 2024-05-28 广州汇思信息科技股份有限公司 Convolutional neural network model loading method, device and storage medium
CN117912484A (en) * 2024-03-20 2024-04-19 北京建筑大学 Pruning-adjustable audio separation model optimization method and device
CN117912484B (en) * 2024-03-20 2024-05-17 北京建筑大学 Pruning-adjustable audio separation model optimization method and device

Similar Documents

Publication Publication Date Title
CN115600650A (en) Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium
CN111709522B (en) Deep learning target detection system based on server-embedded cooperation
CN110880036B (en) Neural network compression method, device, computer equipment and storage medium
CN113011588B (en) Pruning method, device, equipment and medium of convolutional neural network
KR20200070831A (en) Apparatus and method for compressing neural network
CN111950656B (en) Image recognition model generation method and device, computer equipment and storage medium
CN113516230B (en) Automatic convolutional neural network pruning method based on average rank importance ordering
CN112287986B (en) Image processing method, device, equipment and readable storage medium
CN106485316A (en) Neural network model compression method and device
KR20210032140A (en) Method and apparatus for performing pruning of neural network
US20200364538A1 (en) Method of performing, by electronic device, convolution operation at certain layer in neural network, and electronic device therefor
CN112052951A (en) Pruning neural network method, system, equipment and readable storage medium
CN110751175A (en) Method and device for optimizing loss function, computer equipment and storage medium
CN112766496B (en) Deep learning model safety guarantee compression method and device based on reinforcement learning
CN111199507A (en) Image steganography analysis method, intelligent terminal and storage medium
CN113240090B (en) Image processing model generation method, image processing device and electronic equipment
CN116188878A (en) Image classification method, device and storage medium based on neural network structure fine adjustment
CN110807693A (en) Album recommendation method, device, equipment and storage medium
CN111932690B (en) Pruning method and device based on 3D point cloud neural network model
CN113128664A (en) Neural network compression method, device, electronic equipment and storage medium
CN116306879A (en) Data processing method, device, electronic equipment and storage medium
CN116956997A (en) LSTM model quantization retraining method, system and equipment for time sequence data processing
CN116187387A (en) Neural network model quantization method, device, computer equipment and storage medium
CN114742221A (en) Deep neural network model pruning method, system, equipment and medium
KR102454420B1 (en) Method and apparatus processing weight of artificial neural network for super resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination