CN116796823A

CN116796823A - Class adaptive model pruning method, device, electronic equipment and storage medium

Info

Publication number: CN116796823A
Application number: CN202310723421.6A
Authority: CN
Inventors: 王莉; 吴鑫; 徐连明; 费爱国; 张仪
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2023-09-22

Abstract

The invention provides a class self-adaptive model pruning method, a device, electronic equipment and a storage medium, and belongs to the technical field of model pruning. The method comprises the following steps: respectively establishing an optimization model for each category, and respectively determining target pruning rates for each category based on the optimization model; sparse regularization training is carried out on the original convolutional neural network model so as to determine scaling factors of each layer of network in the bone trunk part, which correspond to each channel respectively; aiming at an original convolutional neural network model after sparse regularization training, determining the attention coefficients of each layer of network of the bone trunk part aiming at the target category of each category respectively based on a calibration data set; determining each channel corresponding to each layer of network of the bone stem based on the scaling factors and the target class attention coefficients, and aiming at importance coefficients of each class respectively; based on the target pruning rate and the importance coefficient, pruning is respectively carried out on the original convolutional neural network model aiming at each category. The pruning model obtained by the method is convenient to deploy and has better performance.

Description

Class adaptive model pruning method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of model pruning, and in particular, to a class adaptive model pruning method, device, electronic apparatus, and storage medium.

Background

In recent years, deep learning techniques have rapidly developed and are widely used in many industries, such as natural language processing, computer vision, and the like. However, as model sizes continue to increase, deploying these models on end devices faces significant challenges: because of limited resources such as terminal equipment computing, deployment of a large model may cause excessive reasoning delay and excessive reasoning energy consumption, which is particularly prominent in application scenarios with high requirements on terminal user experience. For this reason, model compression has become an area of intense research with the aim of reducing the size of the model to such an extent that it can be deployed on terminal equipment, while maintaining the performance of the model. One common method is pruning, which aims at deleting unimportant network structures, reducing the size of the model, while preserving the inference accuracy of the model.

However, most pruning methods only focus on the reasoning accuracy of the model, ignoring some important factors when actually deployed on the device, including resource requirements, latency requirements, class requirements, and the like, and resource requirements such as storage space and energy, and the like. For example, in an emergency disaster relief scenario, an unmanned aerial vehicle may be used to sense a disaster and provide information to rescue workers, at this time, a rescue task has a higher priority and accuracy requirement on the sensing result of the unmanned aerial vehicle, a task for sensing a disaster victim has a higher requirement on time delay, and meanwhile, the electric quantity and storage resources of the unmanned aerial vehicle may also change during the flight. Therefore, ignoring the above-mentioned important factors in performing model pruning may result in that a pruned model cannot be deployed on the terminal device or that the pruned model cannot obtain optimal performance.

Disclosure of Invention

The invention provides a class self-adaptive model pruning method, a device, electronic equipment and a storage medium, which are used for solving the defect that in the prior art, when model pruning is carried out, only the reasoning precision of a model is concerned, so that the pruned model is difficult to deploy or has poor performance, realizing the pruning method based on comprehensive consideration of various factors, so that the pruned model can be successfully deployed on the equipment, and the obtained class pruning model has better performance.

The invention provides a class self-adaptive model pruning method, which comprises the following steps:

respectively establishing an optimization model for each category based on the storage requirement, the inference energy consumption requirement, the inference delay requirement and the inference precision requirement for each category, respectively determining a target pruning rate for each category based on the optimization model, and classifying and identifying each category for the original convolutional neural network model;

sparse regularization training is carried out on the original convolutional neural network model so as to determine scaling factors of channels corresponding to each layer of network in the bone stem of the original convolutional neural network model;

aiming at the original convolutional neural network model after sparse regularization training, determining the attention coefficients of each layer of network of the bone trunk part aiming at the target category of each category respectively based on a calibration data set;

Determining each channel corresponding to each layer of network of the bone stem based on the scaling factor and the target category attention coefficient, and aiming at the importance coefficient of each category;

and based on the target pruning rate and the importance coefficient, pruning the original convolutional neural network model according to each category to obtain category pruning models according to each category.

According to the class self-adaptive model pruning method provided by the invention, an optimization model for each class is respectively built based on storage requirements, reasoning energy consumption requirements, reasoning time delay requirements and reasoning precision requirements for each class, and the target pruning rate for each class is respectively determined based on the optimization model, and the method comprises the following steps:

respectively establishing an optimization model for each category according to the storage requirement, the reasoning energy consumption requirement, the reasoning time delay requirement and the reasoning precision requirement for each category;

establishing an inference energy consumption prediction model, an inference time delay prediction model, a storage prediction model and an inference precision prediction model respectively aiming at each category aiming at the original convolutional neural network model, wherein the inference energy consumption prediction model represents the relationship between the inference energy consumption and the pruning rate required by the post-pruning model, the inference time delay prediction model represents the relationship between the inference time delay and the pruning rate of the post-pruning model, the storage prediction model represents the relationship between the storage space required by the post-pruning model and the pruning rate, and the inference precision prediction model represents the relationship between the inference precision and the pruning rate of the post-pruning model;

And respectively determining pruning rates corresponding to the optimal solutions of the optimization models for all the categories based on the reasoning precision prediction model, the reasoning energy consumption prediction model, the reasoning time delay prediction model and the storage prediction model, and respectively determining the pruning rates corresponding to the optimal solutions of the optimization models for all the categories as target pruning rates for all the categories.

According to the class-adaptive model pruning method provided by the invention,

obeys the following conditions: s is S _prune ≤S _budget

E _prune ≤E _budget

0≤p≤1

Wherein p represents pruning rate, minimum represents minimization, T _prune The reasoning time delay of the model after pruning is represented, S _prune Representing the required storage space of the pruned model S _budget Representing the threshold value of the memory space,representing the inference accuracy of the model after pruning for class n,/->Represents an inference accuracy threshold for class n, E _prune Represents the required inference energy consumption of the pruned model, E _budget Representing an inference energy consumption threshold. According to the class self-adaptive model pruning method provided by the invention, an inference energy consumption prediction model, an inference time delay prediction model, a storage prediction model and an inference precision prediction model respectively aiming at each class are established aiming at the original convolutional neural network model, and the method comprises the following steps:

establishing an initial reasoning energy consumption prediction model, and determining a reasoning energy consumption prediction model aiming at the original convolutional neural network model based on the initial reasoning energy consumption prediction model;

Establishing an inference delay prediction model based on the inference delay and real sampling data corresponding to the pruning rate aiming at the original convolutional neural network model;

aiming at the original convolutional neural network model, establishing a storage prediction model based on a storage space and real sampling data corresponding to pruning rate;

and respectively establishing the inference precision prediction model for each category based on the inference precision corresponding to each category and the real sampling data corresponding to the pruning rate aiming at the original convolutional neural network model.

According to the class self-adaptive model pruning method provided by the invention, the initial reasoning energy consumption prediction model is established, and the method comprises the following steps:

establishing an inference energy consumption data set, wherein the inference energy consumption data set comprises calculated amounts, parameter amounts and inference energy consumption corresponding to different convolutional neural networks;

dividing the reasoning energy consumption data set into a reasoning energy consumption training set and a reasoning energy consumption verification set;

training based on the reasoning energy consumption training set to obtain a plurality of different regression models;

and respectively verifying the performances of a plurality of different regression models based on a verification set, and determining the regression model with the optimal performance as an initial reasoning energy consumption prediction model.

According to the class adaptive model pruning method provided by the invention, the original convolutional neural network model trained for sparse regularization is used for determining the target class attention coefficients of each layer of network of the bone trunk part for each class based on a calibration data set, and the method comprises the following steps:

the standard data set is used as the input of the original convolutional neural network model after sparse regularization training, and a prediction frame and a prediction category corresponding to each calibration sample in the calibration data set are output;

performing back propagation on the original convolutional neural network model after sparse regularization training, and determining a gradient map of each layer of network of the backbone part for each prediction type;

determining a gradient category activation map of each layer network of the bone stem for each calibration sample based on the gradient map of each layer network of the bone stem for each prediction category;

determining the attention coefficient of each layer of network of the bone trunk for each calibration sample based on the prediction frame and the gradient class activation map;

based on the calibration samples with the same category, determining the attention coefficient of each layer of network of the bone stem for each category according to the attention coefficient of each layer of network of the bone stem for each calibration sample.

According to the class adaptive model pruning method provided by the invention, the determining each channel corresponding to each layer of network of the bone trunk based on the scaling factor and the target class attention coefficient respectively aims at importance coefficients of each class, and the method comprises the following steps:

and multiplying the scaling factors of the channels corresponding to the same layer of network by the target category attention coefficient corresponding to the layer for each category for each layer of network of the backbone network to obtain the importance coefficient corresponding to each layer of network of the backbone part for each category.

The invention also provides a class self-adaptive model pruning device, which comprises:

the construction module is used for respectively establishing an optimization model aiming at each category based on the storage requirement, the inference energy consumption requirement, the inference delay requirement and the inference precision requirement aiming at each category, respectively determining the target pruning rate aiming at each category based on the optimization model, and classifying and identifying each category for the original convolutional neural network model;

the first determining module is used for carrying out sparse regularization training on the original convolutional neural network model so as to determine scaling factors of channels corresponding to each layer of network in the bone stem part of the original convolutional neural network model;

The second determining module is used for determining the attention coefficient of each layer of network of the bone trunk part aiming at each class of target class respectively based on a calibration data set aiming at the original convolutional neural network model after sparse regularization training;

a third determining module, configured to determine each channel corresponding to each layer of network of the bone stem based on the scaling factor and the target category attention coefficient, and each channel is respectively specific to an importance coefficient of each category;

and the pruning module is used for respectively pruning the original convolutional neural network model according to each category based on the target pruning rate and the importance coefficient to obtain category pruning models according to each category.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the class-adaptive model pruning method according to any one of the above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a class-adaptive model pruning method as described in any one of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a class-adaptive model pruning method as defined in any one of the above.

According to the class self-adaptive model pruning method, device, electronic equipment and storage medium, the optimization model is built based on the storage requirement, the inference energy consumption requirement, the inference time delay requirement and the inference precision requirement, the target pruning rate meeting the storage requirement, the inference energy consumption requirement, the inference precision requirement and the inference time delay requirement is obtained by solving the optimization model corresponding to each class, class-based self-adaptive pruning is further carried out based on the target pruning rate, so that balance among requirements of the model in a dynamic environment is achieved, accuracy of the class pruning model for the target class is improved, and the pruned model is convenient to deploy on the equipment and has high performance.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a pruning method of a class adaptive model provided by the invention;

FIG. 2 is a schematic diagram of a fitted curve of an inference delay prediction model provided by the invention;

FIG. 3 is a schematic representation of a fitted curve of a stored predictive model provided by the present invention;

FIG. 4 is a schematic diagram of a fitted curve of the inference accuracy prediction model provided by the present invention;

FIG. 5 is a schematic flow chart of collecting inferred energy consumption data in accordance with the present invention;

FIG. 6 is a schematic diagram of a fitted curve of the inferred energy consumption prediction model provided by the present invention;

FIG. 7 is a schematic diagram of an example pruning of an original convolutional neural network provided by the present invention;

FIG. 8 is a schematic structural diagram of a class-adaptive model pruning device provided by the invention;

fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

A class-adaptive model pruning method of the present invention is described below with reference to fig. 1 to 7, and as shown in fig. 1, the class-adaptive model pruning method includes:

s101: respectively establishing an optimization model for each category based on the storage requirement, the inference energy consumption requirement, the inference delay requirement and the inference precision requirement for each category, respectively determining a target pruning rate for each category based on the optimization model, and classifying and identifying each category for the original convolutional neural network model;

s102: sparse regularization training is carried out on the original convolutional neural network model so as to determine scaling factors of channels corresponding to each layer of network in the bone stem of the original convolutional neural network model;

s103: aiming at the original convolutional neural network model after sparse regularization training, determining the attention coefficients of each layer of network of the bone trunk part aiming at the target category of each category respectively based on a calibration data set;

s104: determining each channel corresponding to each layer of network of the bone stem based on the scaling factor and the target category attention coefficient, and aiming at the importance coefficient of each category;

s105: and based on the target pruning rate and the importance coefficient, pruning the original convolutional neural network model according to each category to obtain category pruning models according to each category.

Specifically, the objective of class-adaptive pruning is to improve the accuracy of the post-pruning model for a specific class, and the objective is modeled as follows:

from the above equation, it can be seen that the objective of class-adaptive pruning is to minimize the post-pruning model N _c,p (x；W _p The method comprises the steps of carrying out a first treatment on the surface of the N) and the original model N (x; w is a metal; n) loss for a particular class n, where x represents the input of the model. By pruning algorithm P _c Applied to an original model N (x; W; N) with weight W to obtain a pruned model N _c,p (x；W _p The method comprises the steps of carrying out a first treatment on the surface of the n), thereby obtaining the model weight W after pruning for the specific category n _p . Pruning algorithm P _c Based on some preset indexes and considering the specific category n, selecting the weight which is most important for the specific category n in the original model, and removing the rest weights.

It can be understood that the model is deployed on the terminal device, the terminal device has the characteristics of power, storage and other resource changes, the parameters of the model need to be stored on the terminal device, and the power consumption of the terminal device is needed in the reasoning process of the model. Thus, the above-mentioned storage requirements and inferred energy consumption requirements are set based on the power resources and storage resources of the terminal device. The inference delay requirement is set based on the urgency of the prediction task, and the inference precision requirement is set based on the precision requirement of the prediction task.

When the optimization model is built, the optimization model which can balance the requirements of each category can be built according to the respective importance degrees of the storage requirement, the inference energy consumption requirement, the inference time delay requirement and the inference precision requirement of each category, and comprehensively considered, for example, the optimization model for the category A is built according to the storage requirement, the inference energy consumption requirement, the inference time delay requirement and the inference precision requirement for the category A. And determining the target pruning rate of pruning the original convolutional neural network model for each category by respectively solving the optimization model for each category. For example, when an optimization model is built, it is necessary to reduce the inference energy consumption and storage capacity of the model inference when the device energy and storage resources are scarce, and to provide more accurate inference results when the device resources are sufficient, i.e., the inference accuracy of the model can be mainly considered.

Sparse regularization training is carried out on the original convolutional neural network model, and each layer of network layer in the bone stem of the original convolutional neural network model is determined _k And scaling factors corresponding to the channels respectively. Wherein the backbone part comprises a plurality of network layer layers connected in sequence _k Each network layer _k After sparse regularization training is carried out on an original convolutional neural network model, each BN (Batch Normalization) layer corresponds to each channel and corresponds to a scaling factor, and the scaling factor of each BN layer of each layer of network corresponds to each channel is the scaling factor of each layer of network.

The BN layer is used for normalizing the data output by the previous conv layer in the convolutional neural network, and has the function of adjusting the data input into the next conv layer so that the data distribution is more stable. Specifically, the BN layer normalizes the input data of each Batch (Batch) to make the mean value of the data be 0 and the standard deviation be 1, so that the influence of each conv layer of the network model on the data distribution can be eliminated, the network model is more stable, the training speed of the network model can be improved, the network model can be more easily converged, and the formula for normalizing the BN layer is expressed as follows:

wherein z is _in Representing the output of the previous conv layer,representing the size of the batch data, e=10 ^-6 ，Respectively represent z _in Mean and variance of>Representing normalized data, z _out Characteristic diagram of corresponding channel output by BN layer, gamma and beta are two learnable parameters respectively, and the shape is z _in Wherein, gamma is a scaling factor set for adjusting the order of magnitude of the output data of each layer of network, so that the data of different layers can be matched better, beta represents offset, gamma and beta can be learned together in the sparse regularization training process, the scaling factor set comprises a plurality of elements, and the plurality of elements are the scaling factors gamma of the BN layer corresponding to each channel respectively _c 。

For the original convolutional neural network model, when some channels are scaled by a factor gamma _c When the value of the characteristic map is 0, the characteristic map z of the corresponding channel output by the BN layer _out Also trending towards 0, channels for which these scaling factors trended towards 0 have little contribution to the subsequent reasoning, so that removal of these channels can be considered. To obtain a model that can be pruned, a scaling factor set γ is added by one L1 (l1=λ Σ during sparse regularization training _γ∈Γ The |gamma|) regularization term is used for limiting the sum of absolute values of parameters of the original convolutional neural network model, the original convolutional neural network model can automatically select important scaling factors, the values of other scaling factors tend to be 0, and a loss function L of the scaling factors and the regularization term is as follows:

wherein x and t respectively represent the input and corresponding true values of the original convolutional neural network model, W represents the trainable weight, and l (f (x, W, y)) is a loss function of normal training of the original convolutional neural network. After sparse regularization training is completed, all scaling factor sets gamma in the original convolutional neural network model generally show sparse trend, lambda is a weight coefficient, and gamma represents a set of all scaling factors.

When the original convolutional neural network model is subjected to sparse regularization training, training can be stopped after the training round number meets the preset round number, the trained original convolutional neural network model is obtained, and then the scaling factors of each layer of network in the bone stem of the original convolutional neural network model, which correspond to all channels, can be determined.

When tasks of different categories are executed, importance degrees of the networks of different layers are different for the different categories, pruning of the models corresponding to all the categories is adopted in the same mode, and importance differences of the networks of different layers for the different categories are not considered. For an original convolutional neural network model after sparse regularization training, determining each layer of network layer of the bone stem based on a calibration data set _k The target category attention coefficients for each category are separately determined. Wherein, a data set can be determined, the data set is divided into a training set and a verification set, and the training set is used for sparse regularization training of the original convolutional neural network model. M (M is greater than or equal to 1000) samples are extracted from the verification set as a calibration data set Dataset _cal ＝{img _i ,i∈[1,M]}，img _i Representing calibration samples in the calibration data set. The class of this calibration dataset is class = { class _j ,j∈[1,S]And S represents all categories to which the calibration data set corresponds. For example, in the calibration data set, there are 100 calibration samples, 10 calibration samples belong to the category rabbit, 25 calibration samples belong to the category dog, 5 calibration samples belong to the category monkey, 30 calibration samples belong to the category pig, 12 calibration samples belong to the category cat, 18 calibration samples belong to the category chicken, S represents all the categories corresponding to the calibration data set, and S is 6 for a total of 6 categories. When the calibration data set has 6 categories, each layer of network determining the bone stem is respectively aimed at the target category attention coefficients of each category, namely each layer of network can determine 6 target category attention coefficients, and the 6 target category attention coefficients respectively correspond to the 6 categories.

After determining the target category attention coefficient corresponding to each layer of network and the scaling factor of each channel corresponding to each layer of network, comprehensively considering the scaling factor and the target category attention coefficient, and determining each channel corresponding to each layer of network of the bone trunk, wherein the importance coefficients of each category are respectively aimed at.

After the importance coefficient is obtained, pruning is carried out on the original convolutional neural network model according to each category, and category pruning models respectively aiming at the categories are obtained. For a certain category, specifically, pruning is performed from a channel with the smallest importance coefficient for the category until the target pruning rate corresponding to the category is reached, namely, a category pruning model for the category is obtained. Compared with pruning considering overall precision, pruning considering a specific type can sacrifice the precision of other types, and parameters more important to the specific type are reserved, so that the inference precision of the model after pruning for the specific type is higher.

According to the class self-adaptive model pruning method provided by the invention, the optimization model is built based on the storage requirement, the inference energy consumption requirement, the inference time delay requirement and the inference precision requirement, the target pruning rate meeting the storage requirement, the inference energy consumption requirement, the inference precision requirement and the inference time delay requirement is obtained by solving the optimization model corresponding to each class, and class-based self-adaptive pruning is further carried out on the basis of the target pruning rate, so that the balance of the model among the requirements in a dynamic environment is realized, the accuracy of the class pruning model for the target class is improved, and the pruned model is convenient to deploy on equipment and has higher performance.

In one embodiment, the respectively establishing an optimization model for each category based on the storage requirement, the inference energy consumption requirement, the inference delay requirement and the inference precision requirement for each category, and respectively determining the target pruning rate for each category based on the optimization model includes:

Specifically, according to the respective importance degrees of the storage requirement, the inference energy consumption requirement, the inference time delay requirement and the inference precision requirement for each category, comprehensively considering, respectively establishing an optimization model capable of balancing each requirement for each category.

The method comprises the steps of establishing an inference energy consumption prediction model, an inference time delay prediction model, a storage prediction model and an inference precision prediction model respectively aiming at each category aiming at an original convolutional neural network model, wherein the inference energy consumption prediction model can represent the relationship between the inference precision and the pruning rate of the post-pruning model, the inference energy consumption prediction model can represent the relationship between the inference energy consumption and the pruning rate required by the post-pruning model, the inference time delay prediction model can represent the relationship between the inference time delay and the pruning rate of the post-pruning model, and the storage prediction model can represent the relationship between the storage space and the pruning rate required by the post-pruning model. Therefore, the optimization model for each category established according to the storage requirement, the inference energy consumption requirement, the inference delay requirement and the inference precision can be converted into an optimization model based on the pruning rate, and the target pruning rate for each category can be determined by carrying out optimal solution on each optimization model.

In one embodiment, the optimization model is:

obeys the following conditions:

wherein p represents pruning rate, minimum represents minimization, T _prune The reasoning time delay of the model after pruning is represented, S _prune Representing the required storage space of the pruned model S _budget Representing the threshold value of the memory space,representing the inference accuracy of the model after pruning for class n,/->Represents an inference accuracy threshold for class n, E _prune Represents the required inference energy consumption of the pruned model, E _budget Representing an inference energy consumption threshold.

Specifically, the optimization model may be established according to the terminal resource situation and the requirements of the user on the reasoning delay and the reasoning precision, and the optimization model may be of other types besides the formula (4) by way of example.

The optimization model comprehensively considers respective importance degrees of storage requirements, reasoning energy consumption requirements, reasoning time delay requirements and reasoning precision requirements, so that the pruned model is easy to deploy and meanwhile the performance of the model is good.

In one embodiment, the establishing an inference energy consumption prediction model, an inference delay prediction model, a storage prediction model and an inference precision prediction model for each category respectively for the original convolutional neural network model includes:

Specifically, an initial inference energy consumption prediction model is established, and the initial inference energy consumption model can represent the relation between the parameter quantity and the calculated quantity and the inference energy consumption. Furthermore, the parameter quantity and the calculated quantity of the model are related to the pruning rate, so that the parameter quantity and the calculated quantity are expressed based on the pruning rate, and the inference energy consumption prediction model capable of representing the relationship between the inference energy consumption and the pruning rate can be determined.

Considering the influence of different pruning rates on the reasoning delay, for an original convolutional neural network model, in the embodiment, a linear least square method is used for fitting the relation between the pruning rates and the reasoning delay, and a reasoning delay prediction model is established and expressed as:

T _prune ＝w ₀ +w ₁ ×p (5)

Wherein T is _prune Represents the reasoning time delay, p represents the pruning rate and the parameter w ₀ And w ₁ The linear least square method can be used for learning from the actual data pair of the inference delay and the pruning rate, a fitted curve of the inference delay prediction model after fitting is shown in fig. 2, and the expression of "100-pruning rate%" in fig. 2 indicates that when the pruning rate is 0%, 100 is corresponding to the abscissa.

Considering the influence of different pruning rates on the storage space, for the original convolutional neural network model, in the embodiment, a linear least square method is used for fitting the relation between the pruning rates and the storage space, a storage prediction model is established, and the storage prediction model is expressed as:

S _prune ＝w ₂ +w ₃ ×p (6)

wherein S is _prune Represents the storage space, p represents the pruning rate, and the parameter w ₂ And w ₃ The linear least square method can be used for learning from the actual data pair of the storage space and the pruning rate, the fitted curve of the storage prediction model after fitting is shown as figure 3, and the pruning rate is 100 in figure 3The "%" indicates 100 on the abscissa when the pruning rate is 0%.

In consideration of the influence of different pruning rates under each category on the reasoning precision, in the embodiment, a reasoning precision prediction model is established by sampling a data pair of the true reasoning precision and pruning rate under the category n, an S-shaped function is determined as a fitting function corresponding to the reasoning precision prediction model, and the reasoning precision prediction model obtained after fitting is expressed as follows:

Wherein,,is the prediction accuracy of the category n, p is the pruning rate, e is the natural constant, parameter a ⁿ And b ⁿ Control the shape of the S-shaped function, parameter a ⁿ Control the maximum value of the S-shaped function, parameter b ⁿ Determining the midpoint of the curve corresponding to the S-shaped function and the parameter a ⁿ And b ⁿ The values of (2) are learned from the data pair of the inference accuracy and the pruning rate, and are determined based on a nonlinear least squares regression method, a fitting graph of the inference accuracy prediction model is shown in fig. 4, and "100-pruning rate%" in fig. 4 indicates that 100 is corresponding to the abscissa when the pruning rate is 0%.

In one embodiment, the establishing an initial inferential energy consumption prediction model includes:

Specifically, the terminal device includes a mobile device with limited electric quantity, so that the reasoning process of the model on the terminal device needs to consider the problem of electric quantity resource consumption, and if the reasoning energy consumption is too high, the performance of the terminal device may be affected, and even the terminal device is turned off. However, due to the complexity of the model and the influence of various factors such as a hardware platform, input data, environmental conditions and the like, it is very difficult to estimate and infer the energy consumption, and it is difficult to accurately describe the energy consumption by a mathematical modeling method. In order to solve the problem, the embodiment establishes an inference energy consumption data set by collecting the inference energy consumption data of different models under different conditions, and establishes an initial inference energy consumption prediction model by using a data-driven method.

Model inference energy consumption is typically related to the computational complexity and storage complexity of the model. The computational complexity can be measured in terms of floating point operations (FLPs), which reflect the number of mathematical operations that the model needs to perform when reasoning. The storage complexity can be measured in terms of a parameter quantity that reflects the number of parameters that the model needs to store and read when reasoning. In general, the higher the computation complexity and the storage complexity, the greater the reasoning energy consumption of the model, because more operations and memory access operations consume more power resources. Based on the above, an initial inference energy consumption prediction model can be established according to the relationship between the inference energy consumption, the calculated amount and the parameter amount.

Illustratively, the method for establishing the initial inference energy consumption prediction model comprises the following steps:

to simulate convolutional neural network models with different calculation amounts, convolutional neural network models with different sizes are generated by three different types of modules, namely random combination convolution (Conv), deep convolution (DWConv) and BottleNeck module (BottleNeck). Illustratively, a total of 5550 different convolutional neural network models are generated, the number of layers of which varies from 10 layers to 50 layers, the size of the input image varying from a maximum of 100×100×3 (pixels) to a minimum of 20×20×3 (pixels). And each convolutional neural network model is randomly inferred using 10 sets of differently sized input images. In order to reduce data collection errors, the final reasoning energy consumption result takes the average value of N (such as N=10) reasoning results, and the data with the reasoning energy consumption less than 0.1 joule is removed.

As shown in FIG. 5, before each randomly generated convolutional neural network model starts reasoning, starting the GPU power of the sub-thread monitoring terminal equipment, recording millisecond-level time stamps and corresponding power, for example, the GPU power can be obtained based on a GPU bus interface provided by NVIDIA authorities, for example, on NVIDIA Jetson TX2, and the current GPU power of the terminal equipment can be obtained in real time through monitoring the address of/sys/bus/i 2c/drivers/ina3221x/0-0040/iio, namely device 0/in/power 0/input. After the single convolution neural network model reasoning is finished, analyzing the power data from the beginning to the end of the reasoning, calculating the energy consumption of one input sample reasoning time by summing the power and time period product in the reasoning process, wherein the average energy consumption of one input sample reasoning N times is the reasoning energy consumption aiming at one input sample, wherein t is _i Representing the time period of the reasoning process, P _i Representing the power in the reasoning process.

By collecting the inferred energy consumption data of different convolutional neural network models, an inferred energy consumption dataset can be established, which contains the calculated amounts, the parameter amounts and the inferred energy consumption corresponding to 5550 convolutional neural network models with different sizes. Illustratively, the complete inference energy consumption dataset is randomly divided into an inference energy consumption training set and an inference energy consumption verification set according to a 9:1 ratio. The inference energy consumption training set is used for training multiple types of regression models, and the inference energy consumption verification set is used for comparing performances, such as accuracy and generalization, of different types of regression models in predicting the inference energy consumption. And determining the regression model with optimal performance as an initial reasoning energy consumption prediction model, and determining that the regression model with optimal performance is a linear regression model through experiments. Linear regression is a simple and reliable regression method that predicts target variables by fitting data points, predicts energy consumption by learning rules in training data, has higher sample efficiency, does not require the generation of multiple additional models to fit data, and reduces the consumption of computational resources, and can be directly written into expressions, thus facilitating subsequent optimization and solution.

The statistical information corresponding to the complete inference energy consumption data set, the inference energy consumption training set and the inference energy consumption verification set respectively is given in table 1, and the statistical information comprises the data quantity, the calculated quantity, the parameter quantity and the maximum value, the minimum value and the average value of three values of the inference energy consumption.

Table 1 inferring energy consumption dataset statistics

The initial inference energy consumption prediction model characterizes the relationship between the inference energy consumption and the parameter quantity and the calculated quantity, the linear regression model is determined as the initial inference energy consumption prediction model, and the parameter quantity and the calculated quantity are expressed based on the pruning rate, so that the inference energy consumption prediction model capable of characterizing the relationship between the inference energy consumption and the pruning rate can be determined, and the inference energy consumption prediction model is expressed as follows:

E _prune ＝(w ₂ +w ₃ ×p)×a _e +(a _f +b _f ×p)×b _e +c _e (8)

wherein E is _prune Representing inferred energy consumption, (w) ₂ +w ₃ X p) represents a parameter amount expressed based on the pruning rate, (a) _f +b _f X p) represents the calculated amount based on the pruning rate, p represents the pruning rate, and parameter a _f 、b _f 、a _e 、b _e And c _e The fitting coefficient representing the relation between the calculated amount of the model and the pruning rate p can be obtained by performing fitting training on the reasoning energy consumption training set, and the obtained fitting curve of the reasoning energy consumption prediction model is shown in figure 6.

In one embodiment, the determining, based on a calibration data set, the target class attention coefficients for each layer of network of the bone shaft for each class, respectively, of the original convolutional neural network model after training for sparse regularization includes:

Specifically, the Model = { layer of the original convolutional neural network Model after sparse regularization training _k ,k∈[1,Y]Y represents the backbone network layer of the original convolutional neural network model _k Is the number of network layer layers _k Including Conv _k And BN (boron nitride) _k The layers, therefore the backbone includes conv= { Conv _k ,k∈[1,Y]Layer and corresponding bn= { BN _k ,k∈[1,Y]Layer, in general, conv _k The output of the layer will be taken as BN _k Is input to the computer.

Using calibration data sets Dataset _cal ＝{img _i ,i∈[1,M]The method comprises the steps of taking the data as input of an original convolutional neural network model after sparse regularization training, and outputting a prediction frame bbox and a prediction class corresponding to each calibration sample in a calibration data set respectively _pred ：

bbox,class _pred ＝Model(img _i ) (9)

Outputting the prediction frame bbox and the prediction class _pred Then, back propagation is carried out on the original convolutional neural network model after sparse regularization training, and layers of each network layer of the bone stem part are determined _k Class for each prediction category _pred Is a gradient map of (2)

Wherein,,the shape of (C is C multiplied by H multiplied by W), C is the channel number of the original convolutional neural network model, H is the pixel number in height, W is the pixel number in width, backprojection represents back propagation, and feat _k Representing a network layer _k Output characteristics, shape and->The same applies to CXH XW.

Will beSumming in the spatial dimension and averaging to obtain an average gradient +.>

Wherein,,representing gradient map->A gradient of the upper pixel point.

Based on average gradientCan obtain a layer _k For class _pred Gradient class activation diagram- >

Wherein,,is c×h×w.

For target detection tasks, class _pred Is the class of the predicted box bbox by retaining class claa _pred Gradient class activation values of all pixel points in the bbox are summed to obtain a layer _k Class pair _pred Category attention coefficient

Wherein,,representing an activation map->Gradient class activation value of upper pixel, (h, w) inbbox, represents pixel located in prediction frame,/o->Representing the number of pixels at the height of the prediction block, is->Representing the number of pixels across the width of the prediction block.

When the layer is acquired _k Attention coefficients for all calibration samples in a calibration datasetThen, based on each calibration sample with the same prediction category, layering _k Averaging the corresponding attention coefficients to obtain the attention coefficient of each layer of network of the bone stem of the original convolutional neural network aiming at each category>For example, when there are 100 calibration samples, the predicted class of 5 calibration samples is monkey, and the layer is then _k The attention coefficients for these five calibration samples are averaged to obtain a layer _k Is a target class attention coefficient of (c).

In one embodiment, the determining the channels corresponding to each layer of network of the bone stem based on the scaling factor and the target class attention coefficient includes:

Specifically, as shown in FIG. 7, when each network layer is obtained _k Target category attention coefficient for each categoryAfter that, use +.>And scaling factor gamma for each channel for each network layer _kc (k represents the corresponding network layer, c represents the corresponding channel) to complete pruning of a specific category, specifically using the product of the two +.>Importance coefficient as to whether each channel corresponding to each layer is important:

for a pair ofSequencing to obtain an importance coefficient set alpha _sort For a determined target pruning rate +.>A scaling factor threshold T can be calculated _α :

Where len represents the length of the collection, mask _k Layer of presentation layer _k Index sets for each channel, each index in the set being either 0 or 1,0 indicating that the channel needs to be removed, 1 indicating that the channel needs to be reserved.

Obtaining mask _k Later, each layer of the original convolutional neural network Model _k Expressed as:

wherein Conv' _k For Conv based on index set pairs _k Obtained after pruning the channels, BN' _k To BN based on index set _k After the channel pruning is carried outObtained. When determining layer _k When a corresponding channel needs to be removed, a layer _k Conv in (C) _k Layer and BN _k The corresponding channels all need to be removed.

The class adaptive model pruning device provided by the invention is described below, and the class adaptive model pruning device described below and the class adaptive model pruning method described above can be correspondingly referred to each other.

As shown in fig. 8, the class-adaptive model pruning device 800 includes: a construction module 801, a first determination module 802, a second determination module 803, a third determination module 804 and a pruning module 805.

The construction module 801 is configured to respectively establish an optimization model based on a storage requirement, an inference energy consumption requirement, an inference delay requirement and an inference precision requirement, and determine a target pruning rate based on the optimization model, where each category is each category of the original convolutional neural network model for classification and identification;

a first determining module 802, configured to perform sparse regularization training on the original convolutional neural network model, so as to determine scaling factors of each layer of network in the bone stem of the original convolutional neural network model, where each layer of network corresponds to each channel respectively;

A second determining module 803, configured to determine, for the original convolutional neural network model after sparse regularization training, a target category attention coefficient of each layer of network of the bone trunk for each category based on a calibration data set;

a third determining module 804, configured to determine, based on the scaling factor and the target category attention coefficient, each channel corresponding to each layer of network of the bone trunk, respectively for each category of importance coefficient;

and a pruning module 805, configured to perform pruning processing on the original convolutional neural network model for each category based on the target pruning rate and the importance coefficient, to obtain category pruning models for each category.

According to the class self-adaptive model pruning device, the optimization model is built based on the storage requirement, the inference energy consumption requirement, the inference time delay requirement and the inference precision requirement, the target pruning rate meeting the storage requirement, the inference energy consumption requirement, the inference precision requirement and the inference time delay requirement is obtained by solving the optimization model corresponding to each class, and class-based self-adaptive pruning is further carried out on the basis of the target pruning rate, so that balance among requirements of the model in a dynamic environment is achieved, the accuracy of the class pruning model for the target class is improved, and the pruned model is convenient to deploy on equipment and has high performance.

In one embodiment, the construction module 801 is specifically configured to:

In one embodiment, the optimization model is:

obeys the following conditions: s is S _prune ≤S _budget

E _prune ≤E _budget

0≤p≤1

In one embodiment, the construction module 801 is specifically configured to:

In one embodiment, the second determining module 803 is specifically configured to:

In one embodiment, the third determining module 804 is specifically configured to:

Fig. 9 illustrates a physical schematic diagram of an electronic device, as shown in fig. 9, which may include: processor 910, communication interface (Communications Interface), memory 930, and communication bus 940, wherein processor 910, communication interface 920, and memory 1030 communicate with each other via communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform a class-adaptive model pruning method that includes:

Further, the logic instructions in the memory 930 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the class-adaptive model pruning method provided by the above methods, the method comprising:

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a class-adaptive model pruning method provided by the above methods, the method comprising:

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A class-adaptive model pruning method, comprising:

2. The classification adaptive model pruning method according to claim 1, wherein the respectively establishing an optimization model for each classification based on a storage requirement, an inference energy consumption requirement, an inference delay requirement and an inference precision requirement for each classification, and respectively determining a target pruning rate for each classification based on the optimization model comprises:

3. The class-adaptive model pruning method according to claim 2, wherein the optimization model is:

obeys the following conditions: s is S _prune ≤S _budget

E _prune ≤E _budget

0≤p≤1

Wherein p represents pruning rate, minimum represents minimization, T _prune The reasoning time delay of the model after pruning is represented, S _prune Representing the required storage space of the pruned model S _budhet Representing the threshold value of the memory space,representing the inference accuracy of the model after pruning for class n,/->Represents an inference accuracy threshold for class n, E _prune Represents the required inference energy consumption of the pruned model, E _budget Representing an inference energy consumption threshold.

4. The classification adaptive model pruning method according to claim 2, wherein the establishing an inference energy consumption prediction model, an inference delay prediction model, a storage prediction model and an inference precision prediction model for each classification respectively for the original convolutional neural network model comprises:

5. The method of class-adaptive model pruning according to claim 4, wherein said establishing an initial inferred energy consumption prediction model comprises:

6. The class-adaptive model pruning method according to any one of claims 1 to 5, wherein the determining, based on a calibration data set, the target class attention coefficients for each layer of network of the bone shaft for each class, respectively, for the original convolutional neural network model after sparse regularization training, comprises:

7. The classification adaptive model pruning method according to any one of claims 1 to 5, wherein the determining each channel corresponding to each layer of network of the bone stem based on the scaling factor and the target classification attention coefficient, respectively, includes:

8. A class-adaptive model pruning device, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the class-adaptive model pruning method according to any one of claims 1 to 7 when the program is executed by the processor.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the class-adaptive model pruning method according to any one of claims 1 to 7.