CN115600650A

CN115600650A - Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium

Info

Publication number: CN115600650A
Application number: CN202211363959.2A
Authority: CN
Inventors: 张维纬; 纪铭; 余浩然
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2023-01-13

Abstract

The invention relates to an automatic convolutional neural network quantitative pruning method based on reinforcement learning, which comprises the steps of firstly obtaining a data set of an image, pre-training the image by using an initialized model to obtain an average rank of a characteristic diagram output by each filter, and sequencing the average rank by combining the global importance of the filters to obtain filter importance information; and realizing automatic neural network model quantization and pruning operation through reinforcement learning, obtaining a neural network model compression strategy with the highest model precision, and obtaining the final neural network model after pruning is completed. The invention carries out global sequencing on the filters in the convolutional layers according to the importance degree of the influence on the model precision and the combination of the average rank, the rank is consistent with the importance degree of the filters, and higher bit numbers are distributed to the weight parameters of the filters with high importance, thereby achieving the maximum precision retention and being capable of compressing the neural network applied to a high-performance computer and then deploying the neural network on mobile edge equipment with weaker calculation and storage.

Description

Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium

Technical Field

The invention belongs to the field of image processing, and particularly relates to an automatic convolutional neural network quantitative pruning method and device based on reinforcement learning and a storage medium.

Background

In recent years, on one hand, models obtained by research of neural networks with a deeper layer number have better and better effects, and on the other hand, with continuous development and innovation of related fields such as unmanned driving and intelligent mobile equipment, the demand for deep neural network models suitable for edge equipment with weaker computing power is gradually increased. Due to the characteristics of the deep neural network, when the deep neural network is deployed on a mobile device, the quantity of parameters and the floating point calculation quantity contained in the deep neural network are extremely huge. For example, a 152-layer ResNet has 6000 or more ten thousand parameters, and when an image with a resolution of 224 × 224 is inferred, 100 or more hundred million floating point operations are required, which is difficult to operate on a platform with limited computing resources, such as a mobile device, a wearable portable device or an internet of things device. In addition, running a deep neural network model on the GPU for real-time target detection is costly. For example, running YOLO v3 on NVIDIA Tesla T4 can detect 40 frames of images per second in real time, but the device is sold in the market for nearly thirty thousand dollars, far beyond the widespread economic viability. The existing neural network model is difficult to realize the consideration of model accuracy and calculation speed on low-cost equipment.

With the development of neural network deployment applications, the high dependence on precision is gradually shifted to ensuring low memory space occupation and less floating point calculation amount as far as possible under the condition of keeping the precision not reduced or within an acceptable reduction degree range. Reinforcement learning is undoubtedly a very promising tool as a way to guide neural networks for automatic learning by agents. Most of the existing neural network quantification and pruning methods need manual experts to continuously adjust parameters to achieve the optimal compression effect, are quite large exponential hyperparametric space, are high in operation difficulty and time-consuming, and are easy to generate the local optimal or suboptimal situation from the practical effect. The existing model pruning method focuses on weight pruning in a model, a fine-grained pruning mode needs specific hardware support, and generalization is poor. Meanwhile, most quantization and pruning strategies are rule-based heuristic methods, and the way is likely to result in suboptimal compression effect.

Disclosure of Invention

The invention aims to provide an automatic convolutional neural network quantitative pruning method, equipment and a storage medium based on reinforcement learning, which can reduce the parameter bit width, the parameter quantity and the floating point calculation quantity of a model through the automatic quantitative pruning process on the premise of small-range precision loss, and can compress the neural network originally applied to a high-performance computer and then deploy the neural network on mobile edge equipment with weaker calculation and storage capacities.

The invention relates to an automatic convolutional neural network quantitative pruning method based on reinforcement learning, which comprises the following steps of;

s10, acquiring a data set of an image, and dividing the data set into a training set and a verification set according to a proportion;

s20, acquiring a convolutional neural network model to be quantitatively pruned, executing initialization, performing 60 rounds of pre-training on images in a training set by using the initialized model to obtain the average rank of the characteristic diagram output by each filter, and sequencing the average rank of the characteristic diagram output by each filter in the convolutional layer of the current model in combination with the global importance of the filter to obtain filter importance information;

step S30, realizing automatic neural network model quantization and pruning operation through reinforcement learning, and obtaining a neural network model compression strategy with the highest model precision, wherein the reinforcement learning agent is an actor-critic network, the actor network consists of actor networks A and B, the critic network is responsible for evaluating the model compression strategy, and a parameter updating method of the actor network and the critic network is a DDPG depth certainty strategy reinforcement learning method;

and S40, performing fine adjustment operation on the neural network model after the quantitative pruning is completed, and obtaining the final neural network model after the pruning is completed.

The step S20 specifically includes:

s21, creating a convolutional neural network model of pruning to be quantified based on a Pythrch frame;

s22, setting pre-training parameters and establishing a layer structure index corresponding to the convolutional neural network model;

s23, performing 60 rounds of pre-training on the images in the training set to obtain the precision of the current model, wherein the model precision is obtained by testing different images in the verification set;

step S24, obtaining the rank of each filter output characteristic diagram in the convolutional layer, and dividing the sum of the ranks of all the filter output characteristic diagrams by the number of the convolutional layer filters to obtain the average rank of the convolutional layer; sorting is carried out by combining the global importance of the filter, and the sorting result is used as the basis for judging the importance of the filter;

and sequencing the average rank of each filter output characteristic diagram in the convolutional layer of the current model in combination with the global importance of the filter to obtain the importance information of the filter.

The filter importance information in step S24 refers to that the filter in each convolutional layer uses the average of the filter importance of the current layer as a threshold, and is an importance filter if the average is greater than the threshold, otherwise is a non-importance filter, all convolutional layers are sorted according to the average of the filter importance, and the convolutional layer with the preset value before sorting is a sensitive layer and is stored in the sensitive layer index of the corresponding convolutional neural network model.

The step S30 specifically includes:

s31, acquiring a target quantization rate and a pruning rate which are set in the hyper-parameter, determining the number of filters needing pruning according to the target pruning rate, and measuring the bit number of the reserved filter parameters according to the quantization rate;

step S32, according to the deterministic strategy of reinforcement learning, if the current layer is in the sensitive layer index, setting the quantization bit number of the current layer output by the deterministic of reinforcement learning to be 8 bits, setting the compression rate to be 0.1, obtaining the number of filters of the current layer of the model, which should execute pruning operation, and simultaneously calculating to obtain the number of the filters to be pruned;

s33, sorting the filters in the layer from low to high by calculating the average rank of the filters and combining the global importance of the filters;

step S34, according to the number of the filters required for pruning of each convolutional layer obtained in the step S31, the actor network A makes different filter pruning strategies according to the judgment whether the current convolutional layer is a sensitive layer or not and the importance degree of the filter of the convolutional layer, and sets the weight of the filter for pruning to be 0; the actor network B distributes different bit widths according to different importance of the filter, the sensitive layer filter distributes higher bit width, and the strategies of the actor networks A and B are gathered to the actor network;

step S35, repeatedly executing the step S32 to the step S34 until the quantization pruning operation on all layers of the model is completed, namely, a complete round of quantization pruning operation is completed, the model precision of the round of quantization pruning operation is completed through verification set verification, and the current model compression strategy and precision are stored in a reinforcement learning experience playback pool;

and S36, repeatedly executing the steps S32 to S35, finishing all rounds of model quantitative pruning operation, and obtaining the neural network model compression strategy with the highest precision.

The global importance estimation formula of the filter in the step S33:

where l (i) represents the layer index of the ith filter, | | · caly |, the filter ₂ Denotes the L2 norm, w _i Represents the weight of the ith filter, alpha, k ∈ R ^L L represents the total number of layers, a trainable variable;

r _l scaling factor, R, representing rank of layer l _l Represents the average rank of the l-th layer, R _min Represents the minimum value of the average rank of all convolutional layers in the neural network, R _max Represents the maximum value of the average rank of all convolutional layers in the neural network.

The parameter updating method in step S30 is a DDPG depth deterministic strategy reinforcement learning method, and training of the actor network and the critic network is to perform training operation by using the precision obtained after a complete round of quantitative pruning and the quantitative pruning state in each round as input, wherein in a state space, 11 shown by the following formula is used for each layer of network t in the state spaceAn attribute to represent the state S _t The attribute characteristics of (2):

(t,n,c,h,w,stride,k,FLOPs[t],Re _all ,Rest,i _w/a ,a _t-1 )

wherein t represents the label of each layer network, n represents the total number of layers of the network, c represents the number of convolution channels, h and w respectively represent the height and width of a convolution characteristic diagram, stride is the step length, k represents the iteration number, FLOPs [ t]Representing the amount of floating-point computation, re, of each layer network t _all Rest is the remaining state, i, for all state responses _w/a Number of quantization bits representing weight and activation, a _t-1 Actions for a t-1 layer network;

the agent obtains the state S of the t-th layer in which the agent is positioned from the environment of filter quantization pruning _t Obtaining the current feature vector

Then outputs S _t Action in the State a _t As quantization strategy and compression ratio of the current layer, the current layer is guided to carry out quantization bit width selection and alternative filter pruning, wherein

N denotes noise, theta denotes all parameters of the actor network, pi _θ (x) As a function of pruning rate;

in the next round of quantitative pruning, the current target Q value y is calculated according to the following formula by collecting m samples in a DDPG experience playback pool _j ：

Wherein the content of the first and second substances,

is obtained through a network of actor targets, and

then it is obtained through the critic target network, and the value of gamma is setSet to 1 to avoid over-priority short term rewards, enabling the agent to take long term rewards into account by reducing the variance of the gradient estimate during agent updates by subtracting the baseline reward b, R _j For the current short-term reward quantifying pruning operations, the value of the gradient estimate is an exponential moving average of this previous reward;

the loss function of DDPG is the mean square error function as shown below:

A＝π _θ (S)+N

wherein

Is the current state S _j The selected action A interacting with the environment of the obtained feature vector increases certain noise N, and the noise is exponentially attenuated after each round of pruning is finished;

the reward function in reinforcement learning is shown by the following formula:

R _FLOPs ＝-Error·log(FLOPs)。

a computer device comprising a memory storing a computer program and a processor implementing any one of the above-described reinforcement learning-based automated convolutional neural network quantitative pruning methods when the computer program is executed.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described automated convolutional neural network quantization pruning methods based on reinforcement learning.

The convolutional neural network quantitative pruning method, equipment and storage medium based on reinforcement learning have better effect than the traditional importance ranking method on the basis of combining the global ranking of average rank, and have the following advantages:

1. the pruning operation is quantified automatically. The quantitative pruning method realized by the reinforcement learning mode is an automatic compression process, and meanwhile, in the quantization and pruning processes, the quantization and pruning strategies can be continuously optimized by the learning mode, so that the optimal compression strategy is finally obtained;

2. the accuracy of the neural network model after compression is greatly ensured. The invention focuses on the filter in the convolutional layer to carry out global sequencing according to the importance degree influencing the model precision and the combination of the average rank, the rank and the filter importance degree have consistency and are combined with importance sequencing pruning, and meanwhile, higher bit digits are distributed to the filter weight parameters with high importance, so that the maximum precision retention is achieved.

3. The intelligent cost of the equipment is reduced. The invention adopts quantization operation while pruning the filter, namely selecting the bit number expressed by the parameter of the filter, and combines pruning and quantization operation, even if the calculation complexity is much higher than that of single pruning operation, a lighter network can still be obtained. Because the price of the high-performance mobile edge device is very expensive, on the premise that the guaranteed precision is slightly reduced, the model after the quantitative pruning operation is directly deployed on the edge device with weak computing and storing capabilities, and the cost of the corresponding computing device can be greatly reduced.

Drawings

Fig. 1 is a schematic diagram of the framework of the present invention.

The invention is described in further detail below with reference to the figures and specific examples.

Detailed Description

As shown in FIG. 1, the automatic convolutional neural network quantitative pruning method based on reinforcement learning of the present invention comprises the following steps;

s20, acquiring a convolutional neural network model to be quantitatively pruned, executing initialization, performing 60 rounds of pre-training on images in a training set by using the initialized model to obtain the average rank of the characteristic diagram output by each filter, and sequencing the average rank of the characteristic diagram output by each filter in the convolutional layer of the current model in combination with the global importance of the filter to obtain filter importance information; the method specifically comprises the following steps:

step S24, obtaining the rank of each filter output characteristic diagram in the convolutional layer, and dividing the sum of the ranks of all the filter output characteristic diagrams by the number of the convolutional layer filters to obtain the average rank of the convolutional layer; and sequencing by combining the global importance of the filter, and taking the sequencing result as a basis for judging the importance of the filter, wherein the average rank calculation formula comprises the following steps:

wherein o is _l(i) The output characteristic diagram of the ith filter of the l layer is shown, rank (x) represents a function of the calculation Rank, and n represents the number of the convolution layer filters;

and (3) sequencing the average rank of each filter output characteristic diagram in the convolutional layer of the current model in combination with the global importance of the filter to obtain filter importance information: the filter in each convolutional layer takes the average number of the importance of the filter in the current layer as a threshold value, the filter is an importance filter if the average number is larger than the threshold value, otherwise, the filter is a non-importance filter, all convolutional layers are sorted according to the average number of the importance of the filter, the convolutional layers in the top 10 percent of the sorting are sensitive layers, and the sensitive layer indexes corresponding to the convolutional neural network model are stored;

step S30, realizing automatic neural network model quantization and pruning operation through reinforcement learning, and obtaining a neural network model compression strategy with the highest model precision, wherein a reinforcement learning agent is an actor-critic network, the actor network consists of actor networks A and B, the critic network is responsible for evaluating the model compression strategy, and a parameter updating method of the actor network and the critic network is a DDPG depth certainty strategy reinforcement learning method; the method specifically comprises the following steps:

s31, acquiring a target quantization rate and a pruning rate which are set in the hyper-parameter, determining the number of filters needing pruning according to the target pruning rate, and measuring the bit number of the reserved filter parameters according to the quantization rate; step S32, according to the deterministic strategy of reinforcement learning, if the current layer is in the sensitive layer index, setting the quantization bit number of the current layer output by the deterministic of reinforcement learning to be 8 bits, setting the compression rate to be 0.1, obtaining the number of filters of the current layer of the model, which should execute pruning operation, and simultaneously calculating to obtain the number of the filters to be pruned;

step S33, sorting the filters in the layer from low to high by calculating the average rank of the filters and combining the global importance of the filters, wherein the global importance estimation formula of the filters is as follows:

wherein l (i) represents the layer index of the ith filter, | · caly |, L ₂ Denotes the L2 norm, w _i Represents the weight of the ith filter, alpha, k ∈ R ^L For trainable variables, L represents the total number of layers,

r _l scaling factor, R, representing rank of layer l _l Represents the average rank of the l-th layer, R _min Represents the minimum value of the average rank of all convolutional layers in the neural network, R _max Representing the maximum value of the average rank of all convolutional layers in the neural network;

step S34, according to the number of the filters required to be pruned of each convolutional layer obtained in the step S31, the actor network A makes different filter pruning strategies according to the judgment whether the current convolutional layer is a sensitive layer or not and the importance degree of the filter of the convolutional layer, and sets the weight of the filter for pruning to 0; the actor network B distributes different bit widths according to different importance of the filter, the sensitive layer filter distributes higher bit width, and the strategies of the actor networks A and B are gathered to the actor network;

Further, the parameter updating method of the above steps is a DDPG depth deterministic strategy reinforcement learning method, wherein training of the actor network and the critic network is performed by taking the precision obtained after a complete round of quantitative pruning and the quantized pruning state in each round as input, wherein in a state space, for each layer network t, a state S is represented by 11 attributes shown in the following formula _t The attribute characteristics of (2):

(t,n,c,h,w,stride,k,FLOPs[t],Re _all ,Rest,i _w/a ,a _t-1 )

wherein t represents the label of each layer network, n represents the total number of layers of the network, c represents the number of convolution channels, h and w respectively represent the height and width of a convolution characteristic diagram, stride is the step length, k represents the iteration number, FLOPs [ t]Represents the floating point calculation quantity, re, of each layer network t _all Rest is the remaining state, i, for all state responses _w/a Number of quantization bits representing weight and activation, a _t-1 Actions for a t-1 layer network;

Then outputs S _t Action a in the State _t As the quantization strategy and compression ratio of the current layer, the current layer is guided to carry out quantization bit width selection and quantization bit width selectionSubstitute filter pruning, wherein

N denotes noise, theta denotes all parameters of the actor network, pi _θ (x) As a function of the clipping rate.

Wherein the content of the first and second substances,

is obtained through a network of actor targets, and

is obtained through the critic target network, setting the value of gamma to 1 to avoid over-prioritization of short-term rewards so that the agent can take care of long-term rewards, reducing the variance of the gradient estimate by subtracting the baseline reward b during agent updates, R _j For the short-term reward of the current quantified pruning operation, the value of the gradient estimate is an exponentially moving average of this previous reward.

The loss function of DDPG is the mean square error function as shown below:

A＝π _θ (S)+N

wherein

Is the current state S _j The selected action A of the obtained feature vector interacting with the environment increases a certain noise N, and the noise is exponentially attenuated after each round of pruning is finished;

R _FLOPs ＝-Error·log(FLOPs)

The convolutional neural network to be quantified and pruned has a plurality of convolutional layers, and the convolutional layers with the top 10% of the sequence obtained in the step S24 are sensitive layers, which means that the convolutional layers are important for the convolutional neural network, and if the convolutional layers are pruned too much or bit allocation is too low, the precision of the convolutional neural network model is greatly reduced, namely the convolutional neural network model is sensitive to quantification and pruning operations, so the convolutional neural network is called as a sensitive layer.

The pruning process is to perform layer-by-layer operation on one convolutional layer of the convolutional neural network, and a sensitive layer and a non-sensitive layer are encountered in the layer-by-layer operation process.

The filters in the convolutional neural network convolutional layer of the pruning to be quantified have different importance respectively, and are divided into an important filter and a non-important filter by taking the average number as a threshold value. The pruning strategy obtained through the actor network is that the important filters of the sensitive layer are reserved, and the non-important filters of the non-sensitive layer are pruned most.

In one embodiment, a computer device, which may be a terminal, is provided that includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an automated convolutional neural network quantitative pruning method based on reinforcement learning. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program is executed by a processor to implement the method for the quantitative pruning of the convolutional neural network based on reinforcement learning provided in any of the above embodiments, and has corresponding functions and advantages.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims

1. An automatic convolution neural network quantitative pruning method based on reinforcement learning is characterized by comprising the following steps;

s20, acquiring a convolutional neural network model to be quantitatively pruned, initializing, performing 60 rounds of pre-training on images in a training set by using the initialized model to obtain the average rank of the characteristic diagram output by each filter, and sequencing the average rank of the characteristic diagram output by each filter in the convolutional layer of the current model in combination with the global importance of the filter to obtain filter importance information;

and S40, carrying out fine adjustment operation on the neural network model after the quantified pruning is completed, and obtaining the final neural network model after the pruning is completed.

2. The method for quantitative pruning of an automatic convolutional neural network based on reinforcement learning according to claim 1, wherein the step S20 specifically comprises:

3. The method according to claim 2, wherein the filter importance information in step S24 is that the filter in each convolutional layer uses the average of the filter importance of the current layer as a threshold, the filter with the importance higher than the threshold is an importance filter, otherwise the filter with the importance higher than the threshold is a non-importance filter, all convolutional layers are sorted according to the average of the filter importance, and the convolutional layer with the preset value before sorting is a sensitive layer, and is stored in the sensitive layer index of the corresponding convolutional neural network model.

4. The method for quantitative pruning of an automatic convolutional neural network based on reinforcement learning according to claim 1, wherein the step S30 specifically comprises:

step S31, obtaining a target quantization rate and a pruning rate which are set in the hyper-parameter, determining the number of filters needing pruning according to the target pruning rate, and measuring the bit number of the reserved filter parameters according to the quantization rate;

5. The method for quantitative pruning of an automatic convolutional neural network based on reinforcement learning as claimed in claim 4, wherein the global importance estimation formula of the filter in the step S33 is as follows:

where l (i) represents the layer index of the ith filter, | | · caly |, the filter ₂ Denotes the L2 norm, w _i Represents the weight of the ith filter, alpha, k ∈R ^L L represents the total number of layers, a trainable variable;

6. The intensive learning-based automatic convolutional neural network quantitative pruning method according to claim 1, wherein the parameter updating method of step S30 is a DDPG depth deterministic strategy intensive learning method, training of actor network and critic network is to perform training operation based on the precision obtained after a complete round of quantitative pruning and the quantized pruning status in each round as input, wherein, in the status space, for each layer network t, the status S is represented by 11 attributes shown by the following formula _t The attribute characteristics of (2):

(t,n,c,h,w,stride,k,FLOPs[t],Re _all ,Rest,i _w/a ,a _t-1 )

wherein t represents the label of each layer network, n represents the total number of layers of the network, c represents the number of convolution channels, h and w respectively represent the height and width of a convolution characteristic diagram, stride is the step length, k represents the iteration number, FLOPs [ t]Representing the amount of floating-point computation, re, of each layer network t _all For all state responses, rest is the remaining state, i _w/a Number of quantization bits representing weight and activation, a _t-1 Actions for a t-1 layer network;

Then outputs S _t Action in the State a _t As the quantization strategy and the compression ratio of the current layer, the current layer is guided to carry out quantization bit width selectionSelective and alternative filter pruning, wherein

Wherein the content of the first and second substances,

is obtained through an actor target network, and

is obtained through the critic target network, setting the value of gamma to 1 to avoid over-prioritization of short-term rewards so that the agent can take care of long-term rewards, reducing the variance of the gradient estimate by subtracting the baseline reward b during agent updates, R _j For the current short-term reward quantifying pruning operations, the value of the gradient estimate is an exponential moving average of this previous reward;

the loss function of DDPG is the mean square error function as shown below:

A＝π _θ (S)+N

wherein

Is the current state S _j The selected action A of the obtained feature vector interacting with the environment will add a certain noise N and the noise will be inAfter each round of pruning, the tree is exponentially attenuated;

the reward function in reinforcement learning is shown in the following formula:

R _FLOPs ＝-Error·log(FLOPs)。

7. a computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the reinforcement learning-based automated convolutional neural network quantization pruning method of any of claims 1 to 6.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for quantitative pruning according to any one of claims 1 to 6 based on an automatic convolutional neural network for reinforcement learning.