CN111626328B

CN111626328B - Image recognition method and device based on lightweight deep neural network

Info

Publication number: CN111626328B
Application number: CN202010298205.8A
Authority: CN
Inventors: 王冬丽; 刘广毅; 周彦
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2023-12-15
Anticipated expiration: 2040-04-16
Also published as: CN111626328A

Abstract

The invention discloses an image recognition method and device based on a lightweight deep neural network, wherein the method comprises the following steps: 1) Constructing and training a deep neural network model for image recognition; 2) And (3) a lightweight model: the parameters of the model and the trimming parameters are circularly updated, and each round of circulation process is as follows: firstly, pruning a channel with lower importance in a current characteristic layer of a current model M to obtain a model M ' after the current round of pruning, then retraining the model M ', and calculating the relative accuracy of the trained model M '; if the relative precision is not negative, firstly calculating rewards of the round of pruning, thereby updating pruning parameters, and then carrying out the next round of circulation; if the relative precision is negative, discarding M', backing to M, and determining whether to update the trimming parameters and then performing the next cycle or ending the cycle according to whether the precision loss is within the allowable precision loss range; 3) And identifying the image to be identified by using the final model. The method and the device are suitable for image recognition of the resource limited platform.

Description

Image recognition method and device based on lightweight deep neural network

Technical Field

The invention discloses an image recognition method (classification) method and device based on a lightweight deep neural network.

Background

Since the proposal, the deep neural network has been widely applied in the field of image recognition, and has good image recognition effect and high accuracy. However, the deep neural network has high requirements on resources and space, and some platforms with limited resources cannot apply the deep neural network to classify images.

In view of this problem, it is necessary to provide a method capable of performing image recognition based on a lightweight deep neural network on a resource-constrained platform.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art, and provides an image recognition method and device based on a lightweight deep neural network, which can perform image recognition on a platform with limited resources based on the deep neural network.

The technical scheme adopted by the invention is as follows:

in one aspect, an image recognition method based on a lightweight deep neural network is provided, including the steps of:

step 1, constructing a deep neural network model for image recognition; training the constructed model based on a training set to obtain a trained model;

step 2, a lightweight network model: initializing trimming parameters (reinforcement learning parameters) including trimming step sizes of the feature layers (trimming step sizes of the feature layers can be different), and circularly updating parameters of the model and the trimming parameters; the cycle of each round is as follows:

firstly, for a current feature layer in a current model M, ranking importance degrees of all channels by adopting any method for evaluating the importance of the channels, and ranking alpha with lower importance _i Trimming the channels to obtain a model M' after the trimming of the round; the feature layer refers to a module consisting of a convolution layer, a batch normalization layer and an activation layer in the model; the channels in any feature layer are in one-to-one correspondence with the channels of the convolution layer and the channels of the batch normalization layer; alpha _i The trimming step length of the current feature layer is i, and the index of the current feature layer is i;

then, training the model M ' based on a training set, calculating the precision of the model M ' after training based on a verification set, and subtracting the precision of the model M to obtain the relative precision acc of the model M ';

if the relative precision acc is more than or equal to 0, calculating rewards of the current round of model pruning, updating pruning parameters, and then performing the next round of circulation;

if the relative precision acc is less than 0, judging whether the precision loss |acc| is within an allowable precision loss range (such as 5%), if so, discarding the model M 'after the round of trimming, returning to the model M before the round of trimming, updating trimming parameters, and then performing the next round of circulation, otherwise discarding the model M' after the round of trimming, returning to the model M before the round of trimming, and ending the circulation (exiting the light-weight process);

step 3, identifying the image to be identified by utilizing the finally trimmed and trained model (lightweight network model) and determining the class label of the image;

the training set and the validation set each include a plurality of image samples and their category labels.

Further, in the step 2, for any feature layer in the current model M, the importance degree of each channel is ordered based on the scaling coefficient corresponding to each channel in the batch normalization layer; the greater the corresponding scaling factor, the greater the degree of channel importance. And the channel importance degree sorting is performed based on the size of the scaling coefficient, so that the sorting speed is greatly improved compared with the sorting based on the feature map.

Further, in the step 2, the alpha with lower importance is obtained _i Pruning individual channels, namely, the convolutional layer, the full-connection layer and the normalized layer, and the alpha _i The weight associated with each channel is set to 0; in addition to weight reset 0, model reconstruction can be employed, i.e., removing the original model from the model _i The method of the structure associated with each channel implements model pruning.

Further, the method for calculating the precision of the trained model based on the training set and the verification set comprises the following steps: in the training process, training the model based on a training set in each iteration, and calculating the accuracy of the model after each iteration based on a verification set; and after the training reaches the maximum iteration number T, taking the maximum value of the T accuracies as the accuracy of the model after training.

Further, the maximum number of iterations T is set by the user, typically according to slight differences in the size of the data sets (images 32 x 32 typically 40-46 times, images 224 x 224 typically 20-25 times); in the training process, the learning rate is decreased according to the stepwise operation, and the learning rate is decreased to 0.2 at the positions where the iteration times are 30%, 60% and 90% of the maximum iteration times T respectively.

Further, if the relative precision acc is positive, calculating the rewards of the current round of model pruning, thereby updating the pruning parameters by the following method:

first, a reference precision acc of a relative precision acc is calculated _b The primary round of rewards r and the primary round of reference rewards r _b Policy function value Power:

Power＝log(ch _i )θ

wherein eta is a weight for balancing global history relative accuracy acc _line And historical relative accuracy after trimming the ith feature layer；acc _line Is an array, which saves the historical relative precision after trimming each characteristic layer; ch (ch) _i ' and ch _i Ch are respectively _i ' and ch _i The number of channels of the ith characteristic layer before and after the trimming of the round is respectively; θ is a conversion factor:

updating: saving acc to->In (a) and (b);

then, the round of rewards r is compared with the reference rewards r of the round _b If r < r _b Then according to quan _i 、α _i The Power value is equal to the square _i And alpha _i Is updated by the value of (a):

wherein Maxp is the maximum quantization level of the set pruning rate, and is generally 4;

then to quan _i And alpha _i Is subjected to validity processing: if alpha is _i Less than or equal to 1, let alpha _i =1; if quat _i =1, then its value remains unchanged, otherwise, let quat _i ＝quan _i -1；

Otherwise, according to quan _i 、α _i The Power value is equal to the square _i And alpha _i Is updated by the value of (a):

and (3) valuing and carrying out validity treatment: if trimming step alpha _i The channel number ch exceeding the ith characteristic layer after the current round of trimming _i ' let alpha _i Equal to ch _i ' prevent empty layers from occurring.

Further, if the relative precision acc is negative, determining whether the precision loss |acc| is within the allowable precision loss range (e.g., 5%), if so, updating the trimming parameters by:

the relative accuracy of the past is given a degree of attenuation λ:

first according to quan _i 、α _i The Power value is compared withquan _i And alpha _i Is updated by the value of (a):

for quan _i And alpha _i After updating the value of (a), the alpha is further updated _i Is subjected to validity processing: if alpha is _i Less than or equal to 1, let alpha _i ＝1。

In another aspect, an image recognition device based on a lightweight deep neural network is provided, including:

the model construction module is used for constructing a deep neural network model for image recognition;

the model training and trimming module is used for circularly updating parameters and trimming parameters of the model based on the image sample data;

and the classification module is used for detecting the image to be identified by the finally trimmed and trained model and determining the category of the image to be identified.

In another aspect, an electronic device is provided, including a memory and a processor, where the memory stores a computer program, where the computer program when executed by the processor causes the processor to implement the image recognition method based on a lightweight deep neural network.

In another aspect, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the above-described lightweight deep neural network-based image recognition method.

The beneficial effects are that:

the invention can carry out image recognition based on the deep neural network on a platform with limited resources, and improves the image recognition rate on the premise of ensuring the image recognition precision.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a flow chart of a lightweight model in an embodiment of the invention;

fig. 3 is a schematic structural diagram of a feature layer in an embodiment of the present invention.

Detailed Description

The present invention is described in detail below in conjunction with specific examples, which will assist those skilled in the art in further understanding the present invention. The examples described below by reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Example 1:

the embodiment provides an image recognition method based on a lightweight deep neural network, as shown in fig. 1, comprising the following steps:

firstly, for a current feature layer in a current model M, ranking importance degrees of all channels by adopting any method for evaluating the importance of the channels, and trimming alpha channels with lower importance to obtain a model M' after trimming of the round; the feature layer refers to a module consisting of a convolution layer, a batch normalization layer and an activation layer in the model; alpha _i The trimming step length of the current feature layer is i, and the index of the current feature layer is i;

if the relative precision acc is less than 0, judging whether the precision loss |acc| is within an allowable precision loss range (such as 5%), if so, discarding the model M 'after the round of trimming, returning to the model M before the round of trimming, updating trimming parameters, and then performing the next round of circulation, otherwise discarding the model M' after the round of trimming, returning to the model M before the round of trimming, and ending the circulation;

step 3, identifying the image to be identified by utilizing the final trimmed and trained model, and determining the category label of the image;

Example 2:

in the embodiment, based on embodiment 1, in the step 2, for any feature layer in the current model M, importance degree ordering is performed on each channel based on the scaling coefficient corresponding to each channel in its batch normalization layer; the greater the corresponding scaling factor, the greater the degree of channel importance. And the channel importance degree sorting is performed based on the size of the scaling coefficient, so that the sorting speed is greatly improved compared with the sorting based on the feature map.

Example 3:

in this embodiment, based on embodiment 1, in the step 2, the alpha with low importance is compared with the alpha with low importance _i Pruning the channels, namely setting weights associated with the alpha channels in a convolution layer, a full connection layer and a normalization layer to 0; in addition to weight reset 0, model reconstruction can be employed, i.e., removing the original model from the model _i The method of the structure associated with each channel implements model pruning.

Example 4:

the method for calculating the precision of the trained model based on the verification set based on the training set according to the embodiment 1 comprises the following steps: in the training process, training the model based on a training set in each iteration, and calculating the accuracy of the model after each iteration based on a verification set; and after the training reaches the maximum iteration number T, taking the maximum value of the T accuracies as the accuracy of the model after training.

Example 5:

in this embodiment, based on embodiment 4, the maximum iteration number T is set by the user, and is generally slightly different according to the size setting of the input image (the image of 32×32 is generally 40 to 46 times, the image of 224×224 is generally 20 to 25 times), while the learning rate is decreased according to the stepwise decreasing operation, and the learning rate is decreased to 0.2 at the positions where the iteration number is equal to 30%, 60% and 90% of the maximum iteration number T, respectively.

Example 6:

in this embodiment, based on embodiment 1, the step 2 is specifically

S21: initializing trimming parameters including trimming step size and trimming rate quantization level square of each feature layer _i (as an empirical parameter); initializing an index i=1 of the current feature layer; the maximum quantification level Maxp of the pruning rate is set (is an empirical parameter and is generally 4); and circularly updating the parameters of the model and the trimming parameters according to the following steps;

s22: performing one-round trimming on the current feature layer to obtain a model M' after the current-round trimming;

based on a training set training model M ', calculating the precision of the trained model M ' based on a verification set, and subtracting the precision of the model M (namely filtering the precision) to obtain the relative precision acc of the model M '; if the accuracy after the round of trimming is not reduced, the trimming step length of the feature layer can be increased, so that more channels can be trimmed, and if the accuracy after the round of trimming is reduced, the trimming step length of the feature layer needs to be reduced to prevent excessive trimming;

s23: judging whether the relative precision acc is negative, if so, jumping to S28 for execution; otherwise, executing S24;

s24: reference accuracy acc of calculating relative accuracy acc _b The primary round of rewards r and the primary round of reference rewards r _b Policy function value Power; the method comprises the steps of calculating rewards as elements in reinforcement learning, judging the influence of the round of pruning action on a network, wherein the Power value is a temporary variable, is a mapping from rewards to actions, and is in the form of a strategy function;

the specific implementation is as follows:

wherein eta is a weight (is an empirical parameter) for balancing global history relative accuracy acc _line And historical relative accuracy after trimming the ith feature layeracc _line Is an array, which saves the historical relative precision after trimming each characteristic layer;

wherein ch is _i ' and ch _i The number of channels of the ith characteristic layer before and after the trimming of the round is respectively;

Power＝1*log(ch _i )θ (4)

wherein θ is a conversion factor:

updatingSaving acc to->In (a) and (b);

s25, comparing the round of rewards r with the reference rewards r of the round _b If r < r _b Then S26 is performed, otherwise S27 is performed; the principle is that according to r and r _b Is to determine whether trimming is encouraged if r < r _b Pruning (decreasing the pruning step or pruning rate) is discouraged; if r is greater than or equal to r _b Then pruning is encouraged (increasing the pruning step or pruning rate);

s26: first according to quan _i 、α _i The Power value is equal to the square _i And alpha _i Is updated by the value of (a):

i.e. if the ith feature layer prunes the rate quantization level quat _i 1, indicating that the process is being performed at the maximum trimming rate, if the current trimming step alpha _i Still greater than Power, then the current clipping step alpha is made _i Remains Power; if quat _i 1, but the current clipping step alpha _i Power or less, then cause quad _i Increase 1 so that the current clipping step alpha _i Updated to Power/quad _i ；

If quat _i For Maxp, the description proceeds at the lowest pruning rate if the pruning step alpha _i Still greater than 1, then at the original trimming step alpha _i Subtracting 1 on the basis of (2);

if quat _i Between 1 and Maxp, quan _i Increase 1 so that the current clipping step alpha _i Updated to Power/quad _i ；

For quan _i And alpha _i After updating the values of (2), carrying out validity processing on the values: if alpha is _i Less than or equal to 1, let alpha _i =1; if quat _i =1, then its value remains unchanged, otherwise, let quat _i ＝quan _i -1；

Jumping to S29 for execution;

s27: first according to quan _i 、α _i The Power value is equal to the square _i And alpha _i Is updated by the value of (a):

i.e. if square _i 1, indicating that the clipping is being performed at the maximum clipping rate, then the clipping step alpha _i 1 is added on the original basis;

if quat _i For Maxp, the description proceeds at the lowest pruning rate if the current pruning step alpha _i Still lower than Power/square _i Then the current clipping step alpha is made _i Remain Power/quat _i The method comprises the steps of carrying out a first treatment on the surface of the If quat _i Is Maxp, but the current clipping step alpha _i Greater than or equal to Power/square _i Then make quad _i Decrease 1 so that the current clipping step alpha _i Updated to Power/quad _i ；

If quat _i Between 1 and Maxp, if quan _i If 1, the value is kept at 1, otherwise, the value is made equal to _i Decrease 1 so that the current clipping step alpha _i Updated to Power/quad _i ；

For quan _i And alpha _i After updating the values of (2), carrying out validity processing on the values: if trimming step alpha _i The channel number ch exceeding the ith characteristic layer after the current round of trimming _i ' let alpha _i Equal to ch _i ' preventing empty layers;

then jumping to S29 for execution;

s28: judging the relative precision acc, if the relative precision acc exceeds the allowable precision loss range (such as less than-5%), discarding the model M' after the round of trimming, returning to the model M before the round of trimming, and ending the cycle;

if the relative accuracy acc is within the allowable accuracy loss range (e.g., -5% or more), then the relative accuracy is compared to the historical accuracyAnd (3) carrying out certain attenuation:

wherein λ is the degree of attenuation;

giving up the model M' after the round of trimming, and backing to the model M before the round of trimming;

and according to quan _i 、α _i The Power value is equal to the square _i And alpha _i Is updated by the value of (a):

i.e. if square _i For Maxp, the description proceeds at the lowest pruning rate if the pruning step alpha _i Still greater than 1, then at the original trimming step alpha _i Subtracting 1 on the basis of (2);

if quat _i Between 1 and Maxp, quan _i Increase 1 so that the current clipping step alpha _i Updated to alpha _i ·(quan _i -1)/quan _i ；

For quan _i And alpha _i After updating the value of (a), the alpha is further updated _i Is subjected to validity processing: if alpha is _i Less than or equal to 1, let alpha _i ＝1；

Then jump to S29 to continue execution;

s29: changing the index i of the current feature layer, and switching to the next feature layer in the model, namely, making i=i+1; if the current feature layer is the last feature layer, switching to the first feature layer of the model, wherein it should be noted that, since the direction of the information flow is from the input end to the output end, in order to reduce the generation of additional errors, the trimming operation is performed in sequence according to the sequence of the feature layers of the model; jump to S22 execution.

The technical scheme is used for trimming (light weight processing) the deep neural network model, so that the parameter volume (width) of the model is reduced, the space resource occupancy rate of the network is reduced, the energy consumption of the network model operation is reduced, the method is applicable to more platforms (embedded platforms) with limited resources, different model trimming methods are compatible, the weight trimming and the channel trimming are included, and the applicability is wide; according to the technical scheme, reinforcement learning is applied to automatic trimming of the deep neural network, rewards corresponding to the trimming model are constructed, the rewards embody the precision of the model and the state of the model, a deterministic state-to-action strategy function is adopted, the trimming step length is updated by taking the precision as the rewards, and meanwhile, an improved Q-learning method is adopted for a continuous state space, so that the reinforcement learning structure of the continuous space is simplified, the reinforcement learning complexity is reduced, the training speed is improved, and good compression efficiency is achieved. The pruning of the deep neural network is often to manually set a pruning rate or a pruning threshold value, an artificial heuristic algorithm often cannot achieve an optimal compression effect, and the balance of the network can be possibly destroyed, and the pruning of the model in the technical scheme is automatic pruning, so that the interference of human factors can be effectively reduced, the compression efficiency of the deep neural network is improved on the premise of ensuring the precision, and the running speed is improved; according to the technical scheme, the pruning of the neural network is controlled in a knowledge distillation mode, the training set is utilized for retraining (fine tuning) of the model after the pruning, the verification set is utilized for verifying the classification performance of the model after the pruning, and whether the performance of the model is damaged or not is verified, so that the precision of the model after the pruning is ensured; knowledge rectification is a way of learning by guiding a small model by a large model, and can be considered as an enhanced retraining mode, and the small model can learn (self-train) or receive guidance from the large model and learn knowledge provided by the large model, so that the accuracy of the large model is achieved.

Example 8:

the embodiment provides an image recognition device based on a deep neural network, which comprises the following modules:

The working principle of each module in the device is referred to the specific implementation manner of each step in the method embodiment.

Example 9:

the embodiment provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor realizes the image recognition method based on the lightweight deep neural network.

Example 10:

the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described lightweight deep neural network-based image recognition method.

The above description of the embodiments of the present invention is not intended to limit the scope of the present invention. It should be understood that any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of protection of the present invention.

Claims

1. An image recognition method based on a lightweight deep neural network is characterized by comprising the following steps:

step 2, lightweight model: initializing trimming parameters including trimming step alpha of each feature layer in the model _i And circularly updating the parameters of the model and the trimming parameters; the cycle of each round is as follows:

firstly, for a current feature layer in a current model M, ranking importance degrees of all channels by adopting a method for evaluating the importance of the channels, and ranking alpha with lower importance _i Trimming the channels to obtain a model M' after the trimming of the round; wherein the feature layer refers to a module consisting of a convolution layer, a batch normalization layer and an activation layer in the model, and alpha _i The trimming step length of the current feature layer is i, and the index of the current feature layer is i;

if the relative precision acc is less than 0, judging whether the precision loss |acc| is within an allowable precision loss range, if so, discarding the model M 'after the round of trimming, returning to the model M before the round of trimming, updating trimming parameters, and then performing the next round of circulation, otherwise discarding the model M' after the round of trimming, returning to the model M before the round of trimming, and ending the circulation;

if the relative precision acc is positive, firstly calculating the rewards of the current round of model pruning, and updating pruning parameters by the following steps:

first, a reference precision acc of a relative precision acc is calculated _b The primary prize r and the primary reference prize r _b And a policy function value Power:

Power＝log(ch _i )θ

wherein η is the weight; acc (acc) _line Is an array for storing the data of each pairHistorical relative accuracy after pruning of the feature layers,historical relative precision after trimming the ith feature layer; ch' _i And ch (ch) _i The number of channels of the ith characteristic layer before and after the trimming of the round is respectively; θ is a conversion factor:

updatingSaving acc to->In (a) and (b);

wherein Maxp is the maximum quantization level of the set pruning rate;

then to quan _i And alpha _i Is subjected to validity processing: if trimming step alpha _i Beyond the ith feature layer after the trimming of the present wheelChannel number ch _i ' let alpha _i Equal to ch' _i 。

2. The image recognition method based on the lightweight deep neural network according to claim 1, wherein in the step 2, for any feature layer in the current model M, importance degree ranking is performed on each channel based on the scaling coefficient corresponding to each channel in the batch normalization layer; the greater the corresponding scaling factor, the greater the degree of channel importance.

3. The image recognition method based on the lightweight deep neural network according to claim 1, wherein in the step 2, the α with low importance is _i Pruning individual channels, namely, the convolutional layer, the full-connection layer and the normalized layer, and the alpha _i The weight associated with each channel is set to 0.

4. The image recognition method based on the lightweight deep neural network according to claim 1, wherein the method for calculating the accuracy of the trained model based on the training set and the verification set is as follows: in the training process, training the model based on a training set in each iteration, and calculating the accuracy of the model after each iteration based on a verification set; and after the training reaches the maximum iteration number T, taking the maximum value of the T accuracies as the accuracy of the model after training.

5. The image recognition method based on the lightweight deep neural network according to claim 1, wherein in the training process, the learning rate is decreased according to a stepwise decreasing operation, and the learning rate is decreased to 0.2 at the positions where the iteration number is 30%, 60% and 90% of the maximum iteration number T, respectively.

6. The method for recognizing an image based on a lightweight deep neural network according to claim 1, wherein if the relative precision acc is negative, determining whether the precision loss |acc| is within an allowable precision loss range, if so, updating the trimming parameters is as follows:

a certain attenuation degree lambda is carried out on the historical relative precision:

wherein λ is the degree of attenuation; acc (acc) _line The system is an array and is used for storing the historical relative precision of each characteristic layer after trimming;

according to quat _i 、α _i The Power value is equal to the square _i And alpha _i Is updated by the value of (a):

7. An image recognition device based on a lightweight deep neural network is characterized by comprising the following modules:

the model training and trimming module is used for circularly updating parameters and trimming parameters of the model based on the image sample data; comprising the following steps:

initializing trimming parameters including trimming step alpha of each feature layer in the model _i And circularly updating the parameters of the model and the trimming parameters; the cycle of each round is as follows:

firstly, for a current feature layer in a current model M, ranking importance degrees of all channels by adopting a method for evaluating the importance of the channels, and ranking alpha with lower importance _i Trimming the channels to obtain a model M' after the trimming of the round; wherein the feature layer is composed of a convolution layer, a batch normalization layer and a excitation layerActive layer formed module, alpha _i The trimming step length of the current feature layer is i, and the index of the current feature layer is i;

the classification module is used for detecting the image to be identified by the finally trimmed and trained model and determining the class label of the image to be identified; comprising the following steps:

Power＝log(ch _i )θ

wherein η is the weight; acc (acc) _line Is an array used for saving the historical relative precision after trimming each characteristic layer,historical relative precision after trimming the ith feature layer; ch' _i And ch (ch) _i The number of channels of the ith characteristic layer before and after the trimming of the round is respectively; θ is a conversion factor:

updatingSaving acc to->In (a) and (b);

wherein Maxp is the maximum quantization level of the set pruning rate;

then to quan _i And alpha _i Is subjected to validity processing: if trimming step alpha _i The channel number ch exceeding the ith characteristic layer after the current round of trimming _i ' let alpha _i Equal to ch' _i 。

8. An electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to implement the method of any of claims 1-6.

9. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-6.