CN112686382A

CN112686382A - Convolution model lightweight method and system

Info

Publication number: CN112686382A
Application number: CN202011615978.0A
Authority: CN
Inventors: 吴博文; 梁小丹; 聂琳; 林倞
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-20
Anticipated expiration: 2040-12-30
Also published as: CN112686382B

Abstract

The invention provides a convolution model lightweight method and a convolution model lightweight system, wherein the method comprises the following steps: obtaining candidate evaluation candidate pruning strategies; the candidate pruning strategy is a set of pruning rates corresponding to a plurality of levels in the convolution original model; evaluating the candidate pruning strategy to obtain an evaluation result of the candidate pruning strategy; generating a target pruning strategy according to the evaluation result; evaluating the performance of a target pruning strategy on target hardware and outputting an optimal pruning strategy; and carrying out pruning operation on the convolution original model according to the optimal pruning strategy to obtain a lightweight convolution model. The invention greatly reduces the parameter and the calculated amount of the existing convolution model and provides a light-weight framework aiming at real hardware time delay feedback for the convolution model.

Description

Convolution model lightweight method and system

Technical Field

The invention relates to the technical field of neural network pruning, in particular to a convolution model lightweight method and system.

Background

With the rapid development of deep learning theory and related technologies, many problems that people cannot work in the past, such as speech synthesis, speech recognition, etc., have been greatly developed. In particular, the development of Deep Convolutional Neural Networks ("image Classification with Deep Convolutional Neural Networks") has advanced the research of computer vision, and people have made great progress in image Classification, target detection and semantic segmentation through the Deep Convolutional Neural Networks. Meanwhile, with the development of GPU chips, the calculation power which can be used by people is larger and larger, the neural network is deeper and deeper to be constructed, the parameters are larger and larger, and more things can be done by people by using the neural network. However, due to the huge parameters, the deep convolution model is difficult to deploy in practical devices in some scenarios, for example, in an intelligent home scenario, various modules must be deployed on the end device due to privacy protection and time delay, but generally, the computation power and storage of the end device are very limited, which brings unique technical challenges to the existing deep learning technology.

Disclosure of Invention

The invention provides a convolution model lightweight method and a convolution model lightweight system, which solve the problems that the existing convolution model has large parameter quantity and large calculation amount and is difficult to deploy on end equipment.

One embodiment of the present invention provides a convolution model lightweight method, including:

obtaining candidate evaluation candidate pruning strategies; the candidate pruning strategy is a set of pruning rates corresponding to a plurality of levels in the convolution original model;

evaluating the candidate pruning strategy to obtain an evaluation result of the candidate pruning strategy;

generating a target pruning strategy according to the evaluation result;

evaluating the performance of a target pruning strategy on target hardware and outputting an optimal pruning strategy;

and carrying out pruning operation on the convolution original model according to the optimal pruning strategy to obtain a lightweight convolution model.

Further, the evaluating the candidate pruning strategy comprises: evaluation based on short-term fine tuning, in particular:

carrying out pruning operation on the convolution original model according to the candidate pruning strategy to obtain a convolution pruning model;

evaluating the convolution pruning model through a verification set to obtain the verification set accuracy and a loss function value; and taking the verification set accuracy and the loss function value as the evaluation result of the candidate pruning strategy.

Further, the evaluating the candidate pruning strategy comprises: evaluation based on the update lot normalization layer, specifically:

carrying out pruning operation on the convolution original model according to the candidate pruning strategy to obtain a first convolution pruning model;

reasoning is carried out on the convolution pruning model through a training set to obtain a characteristic value;

updating the statistic of the batch normalization layer according to the characteristic value to obtain a sliding average value and a sliding variance;

updating the first convolution pruning model through a sliding mean and a sliding variance to generate a second convolution pruning model;

evaluating the second convolution pruning model through a verification set to obtain the verification set accuracy and a loss function value; and taking the verification set accuracy and the loss function value as the evaluation result of the candidate pruning strategy.

Further, the obtaining of the candidate pruning strategies includes: randomly searching candidate pruning strategies, specifically:

generating a pruning strategy to be candidate according to the search space;

evaluating the pruning strategy to be candidate through a hardware performance feedback module;

if the pruning strategy to be candidate does not meet the preset hardware performance constraint condition, generating another pruning strategy to be candidate again according to the search space;

and if the pruning strategy to be candidate meets the preset hardware performance constraint condition, marking the pruning strategy to be candidate as a candidate pruning strategy.

Further, the obtaining of the candidate pruning strategies includes: searching a candidate pruning strategy based on a genetic algorithm; the candidate pruning strategy search based on the genetic algorithm comprises the following steps:

mutation operation: replacing part of pruning rates in the pruning strategies to be candidate according to the search space to obtain the candidate pruning strategies;

and (3) hybridization operation: and mixing the pruning strategy to be candidate with the optimal pruning strategy of the previous generation to obtain the candidate pruning strategy.

Further, the pruning operation on the convolution original model according to the optimal pruning strategy includes:

obtaining the optimal pruning rate r of the ith layer of the convolution original model according to the optimal pruning strategy_i；

C for obtaining the ith layer of the convolution original model_iEach filter is calculated, and each filter norm is calculated;

c is to be measured_iThe filters are arranged in an order of increasing filter norm;

selecting r with smaller norm_i*c_iAnd carrying out pruning operation on the original model by each filter.

An embodiment of the present invention provides a convolution model lightweight system, including:

the candidate pruning strategy acquisition module is used for acquiring a candidate pruning strategy; the candidate pruning strategy is a set of pruning rates corresponding to a plurality of levels in the convolution original model;

the candidate pruning strategy evaluation module is used for evaluating the candidate pruning strategy to obtain an evaluation result of the candidate pruning strategy;

the target pruning strategy generating module is used for generating a target pruning strategy according to the evaluation result;

the optimal pruning strategy generation module is used for evaluating the performance of the target pruning strategy on target hardware and outputting the optimal pruning strategy;

and the pruning operation module is used for carrying out pruning operation on the convolution original model according to the optimal pruning strategy to obtain a lightweight convolution model.

Further, the candidate pruning strategy evaluation module comprises:

an evaluation submodule based on short-term fine tuning for:

evaluating the convolution pruning model through a verification set to obtain the verification set accuracy and a loss function value; wherein, the accuracy rate and the loss function value of the verification set are used as the evaluation result of the candidate pruning strategy;

an evaluation sub-module based on the update lot normalization layer for:

Further, the candidate pruning strategy obtaining module includes:

a candidate pruning strategy random search sub-module for:

generating a pruning strategy to be candidate according to the search space;

if the pruning strategy to be candidate meets the preset hardware performance constraint condition, marking the pruning strategy to be candidate as a candidate pruning strategy;

the pruning strategy to be candidate searching submodule based on the genetic algorithm is used for:

Further, the pruning operation module is further configured to:

Compared with the prior art, the embodiment of the invention has the beneficial effects that:

one embodiment of the present invention provides a convolution model lightweight method, including: obtaining candidate evaluation candidate pruning strategies; the candidate pruning strategy is a set of pruning rates corresponding to a plurality of levels in the convolution original model; evaluating the candidate pruning strategy to obtain an evaluation result of the candidate pruning strategy; generating a target pruning strategy according to the evaluation result; evaluating the performance of a target pruning strategy on target hardware and outputting an optimal pruning strategy; and carrying out pruning operation on the convolution original model according to the optimal pruning strategy to obtain a lightweight convolution model. The invention greatly reduces the parameter and the calculated amount of the existing convolution model and provides a light-weight framework aiming at real hardware time delay feedback for the convolution model.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a convolution model lightweight method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a convolution model lightweight method according to another embodiment of the present invention;

FIG. 3 is a flowchart of a convolution model lightweight method according to another embodiment of the present invention;

FIG. 4 is a flowchart of a convolution model lightweight method according to another embodiment of the present invention;

FIG. 5 is a flowchart of a convolution model lightweight method according to another embodiment of the present invention;

FIG. 6 is a flowchart of a convolution model lightweight method according to another embodiment of the present invention;

FIG. 7 is a block diagram of an automated convolution model lightweight framework based on hardware platform performance feedback according to an embodiment of the present invention;

FIG. 8 is a flow diagram of a method for random search in an automated convolution model lightweight framework based on hardware platform performance feedback according to an embodiment of the present invention;

FIG. 9 is a flowchart of a genetic algorithm-based search method in an automated convolution model lightweight framework based on hardware platform performance feedback according to an embodiment of the present invention;

FIG. 10 is a diagram of an apparatus for a convolution model lightweight system according to an embodiment of the present invention;

FIG. 11 is a diagram of an apparatus for a convolution model lightweight system according to another embodiment of the present invention;

FIG. 12 is a diagram of an apparatus for a convolution model lightweight system according to another embodiment of the present invention;

fig. 13 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.

The problems of large quantity of parameters and large calculation amount of the depth model are widely existed in various fields, such as natural language processing and the like. A complete set of model Compression pipelines for convolution models is proposed in the article "Deep Compression: Compressing Deep Neural Network with Compressing, Trained Quantization and Huffman Coding", which includes model Pruning, Quantization and Coding. The model pruning can remove redundant parameters in the model, so that the parameter quantity and the calculated quantity of the model are greatly reduced; the quantization can represent parameters in the model by using low-precision floating point numbers or even integers (such as FLOAT16 and INT8), so that the storage space occupied by the model can be greatly reduced, and meanwhile, the low-precision operation overhead is cultivated; the coding can further compress the model storage file by using the traditional file compression technology, and the overhead of model distribution can be reduced.

The present invention focuses on neural network pruning techniques. Neural network Pruning techniques are widely used in the field of computer vision, and are classified into structural Pruning ("Pruning for effective networks") and unstructured Pruning. The difference between the two is whether the pruned weight distribution is regular, for example, in a convolutional network, structured pruning usually subtracts the whole convolution kernel, while unstructured pruning does not have such a feature. The structured pruning has the advantages that abnormal sparse operation cannot be introduced, the acceleration effect can be easily obtained under general hardware and a general framework, the unstructured pruning has the advantages that the granularity of the pruning is fine, the precision loss of a model is small, but the abnormal sparse operation can be introduced, and the acceleration can be obtained only by the support of special hardware and a calculation framework.

To enable technical versatility, the present invention focuses on structured Pruning, and more specifically filter Pruning (proposed as "Pruning filters for effective networks"), which proposes the use of the L1 norm to measure the importance of filters in convolutional networks. But this work does not indicate how to specify the pruning strategy, i.e. how many proportions of the filters should be pruned out per layer. AMC (automatic for model compression and access on mobile devices) proposes to use an enhanced learning mode to find the optimal Pruning strategy, and Meta planning predicts the network precision after Pruning by training an auxiliary network to find the optimal Pruning strategy.

A first aspect.

Referring to fig. 1, an embodiment of the invention provides a convolution model weight reduction method, including:

s10, obtaining candidate evaluation candidate pruning strategies; the candidate pruning strategy is a set of pruning rates corresponding to a plurality of levels in the convolution original model.

In a specific embodiment, the obtaining of the candidate pruning strategies includes: s11, randomly searching candidate pruning strategies, specifically:

and S111, generating a pruning strategy to be candidate according to the search space.

And S112, evaluating the pruning strategies to be candidate through a hardware performance feedback module.

And S113, if the pruning strategy to be candidate does not meet the preset hardware performance constraint condition, generating another pruning strategy to be candidate according to the search space again.

S114, if the pruning strategy to be candidate meets the preset hardware performance constraint condition, marking the pruning strategy to be candidate as a candidate pruning strategy.

In a specific embodiment, the obtaining of the candidate pruning strategies includes: s12, searching candidate pruning strategies based on the genetic algorithm; the candidate pruning strategy search based on the genetic algorithm comprises the following steps:

s121, mutation operation: and replacing part of pruning rates in the pruning strategies to be candidate according to the search space to obtain the candidate pruning strategies.

S122, hybridization operation: and mixing the pruning strategy to be candidate with the optimal pruning strategy of the previous generation to obtain the candidate pruning strategy.

And S20, evaluating the candidate pruning strategies to obtain evaluation results of the candidate pruning strategies.

In a specific embodiment, the evaluating the candidate pruning strategies includes: s21, evaluation based on short-term fine tuning, specifically:

s211, pruning the convolution original model according to the candidate pruning strategy to obtain a convolution pruning model.

S212, evaluating the convolution pruning model through a verification set to obtain the verification set accuracy and a loss function value; and taking the verification set accuracy and the loss function value as the evaluation result of the candidate pruning strategy.

In a specific embodiment, the evaluating the candidate pruning strategies includes: s22, based on the evaluation of the update batch normalization layer, specifically:

s221, pruning operation is carried out on the convolution original model according to the candidate pruning strategy, and a first convolution pruning model is obtained.

S222, reasoning the convolution pruning model through a training set to obtain a characteristic value.

And S223, updating the statistic of the batch normalization layer according to the characteristic value to obtain a sliding average value and a sliding variance.

S224, updating the first convolution pruning model through the sliding mean and the sliding variance, and generating a second convolution pruning model.

S225, evaluating the second convolution pruning model through a verification set to obtain the accuracy rate of the verification set and a loss function value; and taking the verification set accuracy and the loss function value as the evaluation result of the candidate pruning strategy.

And S30, generating a target pruning strategy according to the evaluation result.

And S40, evaluating the performance of the target pruning strategy on target hardware and outputting the optimal pruning strategy.

And S50, pruning the convolution original model according to the optimal pruning strategy to obtain a lightweight convolution model.

In a specific embodiment, the pruning the convolution original model according to the optimal pruning strategy includes:

s51, obtaining the optimal pruning rate r of the ith layer of the convolution original model according to the optimal pruning strategy_i。

S52, obtaining the ith layer c of the convolution original model_iAnd calculating each filter norm.

S53, mixing the c_iThe filters are arranged in order of increasing filter norm.

S54, selecting r with smaller norm_i*c_iAnd carrying out pruning operation on the original model by each filter.

In a specific embodiment, the pruning process includes:

the Pruning method adopted by the invention is similar to that in filter Pruning, and is put forward in Pruning filters for effective communications.

At this pointAnd (3) pruning each model layer by layer, and defining the corresponding pruning rate of the ith layer of the model as r_iLet the layer have c_iFor each filter, we first compute the L1 norm of all filters in the layer, where we use the L1 norm corresponding to each filter to measure the importance of each filter. We rank these filters according to their corresponding L1 norms and choose r with the smaller L1 norm_i*c_iThe filters perform pruning. The above steps are repeated for each layer of the model.

Assuming that the model has L layers, the corresponding set of pruning rates for each layer { r }₁,r₂,…,r_LWe call pruning strategies.

A reasonable pruning strategy can often determine the pruning effect to a great extent, the model effect is often degraded sharply due to too high pruning rate, and the lightweight effect is not obvious due to too low pruning rate. Therefore, it is very important to find a suitable pruning strategy which can well balance the calculation amount, the storage amount and the model effect. Therefore, the invention provides an automatic convolution model lightweight framework based on hardware platform performance feedback, which can automatically decide a pruning strategy.

In a specific embodiment, as shown in fig. 7, the automated convolution model lightweight framework based on hardware platform performance feedback provided by the present invention includes the following modules:

1. a pruning strategy evaluation module: the module is responsible for evaluating the advantages and disadvantages of the candidate pruning strategies, acquiring the candidate pruning strategies from the pruning strategy searching module, and returning the evaluation results to the pruning strategy searching module.

The input and output of the module can be abstracted as: input as a candidate pruning policy { r₁,r₂,…,r_LAnd outputting the score s of the pruning strategy. The score output by the module needs to be capable of truly reflecting the real performance of the pruned model corresponding to the pruning strategy.

Here, we propose two alternatives:

(1) short-term fine-tuning-based evaluation: we have found thatFirstly, an input candidate pruning strategy is applied to an original model M to obtain a corresponding post-pruning model M_pWe are right to M_pAnd performing short-term fine-tuning (i.e. training for several times) to quickly evaluate the performance of the pruned model on the verification set, wherein the performance evaluation index can be the accuracy of the verification set or a loss function value, and the performance evaluation indexes are returned to the search module as the score of the candidate pruning strategy.

(2) An evaluation based on an update lot normalization layer; firstly, an input candidate pruning strategy is applied to an original model M to obtain a corresponding pruned model M_p. The Batch Normalization layer is a Batch Normalization layer, and comprises two types of statistics, namely a moving average and a moving variance, which need to be re-counted because the pruned model still adopts the statistics in the original model, which is not accurate for the pruned model. The re-statistics approach is to perform several model inferences on the training set, and use the obtained eigenvalues to update the statistics, the running mean and the running variance of the batch normalization layer. After the batch normalization layer is updated, performance evaluation is carried out on the pruned model on a verification set, indexes of the performance evaluation can be the accuracy of the verification set or loss function values, and the indexes are used as scores of the candidate pruning strategies and returned to a search module.

2. A pruning strategy search module: and the system is responsible for realizing a search algorithm, distributing candidate pruning strategies to the pruning strategy evaluation module, counting evaluation results and generating a new pruning strategy according to the evaluation results.

This module externally provides two interfaces:

(1) and acquiring a pruning strategy (getbranching Strategy), wherein the method returns a pruning strategy and gives a pruning strategy id to the pruning strategy together as the unique identifier of the pruning strategy.

(2) The method is used for reporting an evaluation result to a search module by an evaluation module, and parameters of the method are a pruning strategy id and an evaluation score without a return value.

The interface of the pruning strategy search module is realized by adopting a gPC protocol, and the pruning strategy evaluation module is also communicated with the search module through the protocol.

In the present invention, we propose two search methods:

(1) random search: first we need to define a search space [0, R ]]R is a real number smaller than 1, and each pruning strategy contains L pruning rates for a neural network with L layers, and can be expressed as { R }₁,r₂,…,r_LIn which all r are_i∈[0,R]. When the getbranching Strategy interface is called, the module randomly generates a pruning strategy according to the search space, delivers the pruning strategy to the hardware performance feedback module for performance evaluation, and regenerates one pruning strategy if the pruning strategy exceeds the given hardware performance constraint until the pruning strategy meets the given hardware performance constraint. And storing the legal pruning strategy in the module, and giving a pruning strategy id. And finally, returning the pruning strategy and the corresponding id thereof to a pruning strategy evaluation module for evaluation. When reportevalsscore is called, the module saves the evaluation results in the module. And finally, selecting the optimal pruning strategy according to the evaluation result, and handing in the next stage. The flow is shown in fig. 8.

(2) Search based on genetic algorithm: in the present invention, there are two differences from the conventional genetic algorithm:

1) the genetic algorithm in the invention is constrained, and the generated pruning strategy needs to meet specific hardware constraint;

2) the evaluation of pruning strategies is relatively time-consuming and at the same time is a distributed evaluation, which requires special logic to be designed for this part. As shown in fig. 9.

The gene corresponding to the genetic algorithm in the invention is the pruning strategy of r₁,r₂,…,r_L}. The genetic algorithm in the present invention comprises three operations:

1) mutation operation: i.e. randomly changing some terms in the pruning strategy, in the present invention we adopt the implementation that we first randomly select 30% of the terms in the pruning strategy to mutate, and then we re-assign pruning rate to these selected terms according to the given search space.

2) The hybridization operation is to generate a new pruning strategy according to the high-quality pruning strategy which survives in the previous generation. The method is specifically realized in that firstly, two pruning strategies are randomly selected from the previous generation as a father pruning strategy 1 and a father pruning strategy 2, and each item in the new pruning strategy is randomly selected from corresponding positions in the two father pruning strategies.

3) The random generation operation randomly generates a pruning strategy according to the search space.

Defining parameters: as shown in Table 1

TABLE 1

Defining variables: as shown in Table 2

TABLE 2

When the getbranching strategy interface is called, the module generates a new pruning strategy by using a genetic algorithm according to the process shown in fig. 9, and the detailed process of the genetic algorithm will be described in detail below.

First, the module will check whether the mutation number exceeds the maximum mutation number or whether the mutation population number reaches the mutation target number, if neither condition is satisfied, i.e. the mutation number does not exceed the maximum mutation number and the mutation population number does not reach the mutation target number, the mutation operation will be performed to generate a new pruning strategy { r }₁,r₂,…,r_L}; then, whether the pruning strategy meets hardware constraints is checked, if the pruning strategy does not meet the hardware constraints, the operation of 'mutation number + ═ 1' is executed, and the module is re-entered, and if the pruning strategy meets the hardware constraints, the operation of 'mutation number + ═ 1' is executed; the number of mutant populations is equal to 1' and the pruning strategy is returned; if one of the conditions is satisfied, the hybridization stage is entered.

In the hybridization stage, the module will check whether the hybridization number exceeds the maximum hybridization number or the hybridization population number reaches the hybridization target number, if neither condition is satisfied, that is, the hybridization number does not exceed the maximum hybridization number and the hybridization population number does not reach the hybridization target number, the hybridization operation will be executed to generate a new pruning strategy { r }₁,r₂,…,r_LThe pruning strategy is checked whether to meet hardware constraint, if not, the operation of 'hybridization number + ═ 1' is executed, and the module is re-entered, and if so, the operation of 'hybridization number + ═ 1' is executed; the number of hybrid population is equal to 1' and returns to the pruning strategy; if one of the conditions is satisfied, entering a random generation stage.

In the random generation stage, the module will check whether the population number exceeds the maximum population number, if the condition is not satisfied, that is, the population number does not exceed the maximum population number, a new pruning strategy is randomly generated, and whether the pruning strategy meets the hardware constraint is checked, if not, the pruning strategy is randomly generated again until the pruning strategy meets the hardware constraint; if the condition is met, executing a next generation switching operation, eliminating the pruning strategy individual with the lowest score of the generation, and executing a next round of genetic algorithm as a parent, wherein the number of eliminated individuals is determined by the eliminated number of each generation.

3. A hardware performance feedback module: this module is responsible for evaluating the performance of a given candidate pruning policy on the target hardware.

Hardware performance we provide here metrics in three dimensions:

(1) model calculations (GFLOPs); for the measurement, directly counting the calculated quantity of the model;

(2) the number of model parameters; for the measurement, directly counting the parameter quantity of the model;

(3) target equipment reasoning time delay; for such measurement, we propose a model delay estimation method based on a target hardware delay look-up table. The look-up table is defined as D e R^L×ic×ocWhere L represents the number of model layers, ic represents the maximum number of channels for the input profile of all layers, and oc represents the maximum number of channels for the output profile of all layers, then the search is performedIn the table, D (i, j, k) represents the operation time of the layer on the target hardware when the number of input channels of the ith layer of the model is j and the number of output channels is k.

We first traverse all possible input-output dimensions of all layers of the model on the target hardware (e.g., handset, ARM CPU, Intel CPU, etc.) and measure the runtime of the layer, and record and look-up table D. The lookup table can be established once and used for multiple times.

And pruning the original model according to the generated pruning strategy, inquiring a lookup table according to the number of channels of the input feature diagram and the output feature diagram of each layer of the pruned model to obtain the running time of each layer on target hardware, and finally summing the running times of each layer to obtain the predicted running time of the pruning model.

4. A pruning model performance recovery module; the module is responsible for carrying out performance recovery on the pruning model corresponding to the optimal pruning strategy obtained by searching.

The module firstly reads in the optimal pruning strategy obtained by the searching module and applies the pruning strategy to the original model. We then perform long-term fine-tuning (fine-tuning) on the pruning model to further restore the performance of the pruning model. Finally obtaining the lightweight finished model.

In another embodiment, we performed experiments relating to lightweight quartz net15x5 on the published english speech recognition dataset libristech to verify the effectiveness of the present invention. Here, we adopt the model calculation amount as a target of weight reduction. We define here two levels of computational constraints, 331M FLOPs and 159M FLOPs. Two naive model lightweight methods are constructed and compared with the method provided by the inventor, and the first method is to directly scale the original model by a certain proportion and train the model. The second method is to carry out pruning with uniform pruning rate on the original model, namely the pruning rate of each layer of the model is equal.

"dev-clean, dev-other, test-clean, test-other" in Table 3 represents different data sets in LibriSpeech. As can be seen from the table below, the framework presented herein achieves better word error rates than the baseline, both under two levels of computational constraints.

TABLE 3

A second aspect.

Referring to fig. 10-12, an embodiment of the invention provides a convolution model lightweight system, including:

the candidate pruning strategy obtaining module 10 is configured to obtain a candidate pruning strategy; the candidate pruning strategy is a set of pruning rates corresponding to a plurality of levels in the convolution original model.

In a specific embodiment, the candidate pruning strategy obtaining module 10 includes:

the candidate pruning strategy random search submodule 11 is configured to: generating a pruning strategy to be candidate according to the search space; evaluating the pruning strategy to be candidate through a hardware performance feedback module; if the pruning strategy to be candidate does not meet the preset hardware performance constraint condition, generating another pruning strategy to be candidate again according to the search space; and if the pruning strategy to be candidate meets the preset hardware performance constraint condition, marking the pruning strategy to be candidate as a candidate pruning strategy.

The pruning strategy to be candidate search submodule 12 based on genetic algorithm is used for: mutation operation: replacing part of pruning rates in the pruning strategies to be candidate according to the search space to obtain the candidate pruning strategies; and (3) hybridization operation: and mixing the pruning strategy to be candidate with the optimal pruning strategy of the previous generation to obtain the candidate pruning strategy.

The candidate pruning strategy evaluation module 20 is configured to evaluate the candidate pruning strategy to obtain an evaluation result of the candidate pruning strategy.

In a specific embodiment, the candidate pruning policy evaluation module 20 includes:

an evaluation sub-module 21 based on short-term fine tuning for: carrying out pruning operation on the convolution original model according to the candidate pruning strategy to obtain a convolution pruning model; evaluating the convolution pruning model through a verification set to obtain the verification set accuracy and a loss function value; and taking the verification set accuracy and the loss function value as the evaluation result of the candidate pruning strategy.

An evaluation sub-module 22 based on the update lot normalization layer for: carrying out pruning operation on the convolution original model according to the candidate pruning strategy to obtain a first convolution pruning model; reasoning is carried out on the convolution pruning model through a training set to obtain a characteristic value; updating the statistic of the batch normalization layer according to the characteristic value to obtain a sliding average value and a sliding variance; updating the first convolution pruning model through a sliding mean and a sliding variance to generate a second convolution pruning model; evaluating the second convolution pruning model through a verification set to obtain the verification set accuracy and a loss function value; and taking the verification set accuracy and the loss function value as the evaluation result of the candidate pruning strategy.

The target pruning strategy generating module 30 is configured to generate a target pruning strategy according to the evaluation result.

The optimal pruning strategy generating module 40 is configured to evaluate performance of the target pruning strategy on the target hardware, and output the optimal pruning strategy.

And the pruning operation module 50 is used for carrying out pruning operation on the convolution original model according to the optimal pruning strategy to obtain a lightweight convolution model.

In a specific embodiment, the pruning operation module 50 is further configured to: obtaining the optimal pruning rate r of the ith layer of the convolution original model according to the optimal pruning strategy_i(ii) a C for obtaining the ith layer of the convolution original model_iEach filter is calculated, and each filter norm is calculated; c is to be measured_iThe filters are arranged in an order of increasing filter norm; selecting r with smaller norm_i*c_iAnd carrying out pruning operation on the original model by each filter.

In a third aspect.

The present invention provides an electronic device, including:

a processor, a memory, and a bus;

the bus is used for connecting the processor and the memory;

the memory is used for storing operation instructions;

the processor is configured to invoke the operation instruction, and the executable instruction enables the processor to perform an operation corresponding to the convolution model lightweight method shown in the first aspect of the present application.

In an alternative embodiment, there is provided an electronic device, as shown in fig. 13, the electronic device 5000 shown in fig. 13 including: a processor 5001 and a memory 5003. The processor 5001 and the memory 5003 are coupled, such as via a bus 5002. Optionally, the electronic device 5000 may also include a transceiver 5004. It should be noted that the transceiver 5004 is not limited to one in practical application, and the structure of the electronic device 5000 is not limited to the embodiment of the present application.

The processor 5001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 5001 may also be a combination of processors implementing computing functionality, e.g., a combination comprising one or more microprocessors, a combination of DSPs and microprocessors, or the like.

Bus 5002 can include a path that conveys information between the aforementioned components. The bus 5002 may be a PCI bus or EISA bus, etc. The bus 5002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 13, but this is not intended to represent only one bus or type of bus.

The memory 5003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 5003 is used for storing application program codes for executing the present solution, and the execution is controlled by the processor 5001. The processor 5001 is configured to execute application program code stored in the memory 5003 to implement the teachings of any of the foregoing method embodiments.

Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like.

A fourth aspect.

The present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a convolution model lightweight method as set forth in the first aspect of the present application.

Yet another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which, when run on a computer, enables the computer to perform the corresponding content in the aforementioned method embodiments.

Claims

1. A convolution model weight reduction method is characterized by comprising the following steps:

generating a target pruning strategy according to the evaluation result;

2. The convolution model lightweight method of claim 1, wherein the evaluating the candidate pruning strategies comprises: evaluation based on short-term fine tuning, in particular:

3. The convolution model lightweight method of claim 1, wherein the evaluating the candidate pruning strategies comprises: evaluation based on the update lot normalization layer, specifically:

4. The convolution model weight-reducing method of claim 1, wherein the obtaining of the candidate pruning strategies comprises: randomly searching candidate pruning strategies, specifically:

generating a pruning strategy to be candidate according to the search space;

5. The convolution model weight-reducing method of claim 1, wherein the obtaining of the candidate pruning strategies comprises: searching a candidate pruning strategy based on a genetic algorithm; the candidate pruning strategy search based on the genetic algorithm comprises the following steps:

6. The convolution model lightweight method according to claim 1, wherein the pruning operation on the convolution original model according to the optimal pruning strategy comprises:

7. A convolution model lightweight system, comprising:

8. The convolution model lightweight system of claim 7, wherein the candidate pruning strategy evaluation module comprises:

an evaluation submodule based on short-term fine tuning for:

an evaluation sub-module based on the update lot normalization layer for:

9. The convolution model lightweight system of claim 7, wherein the candidate pruning strategy acquisition module comprises:

a candidate pruning strategy random search sub-module for:

generating a pruning strategy to be candidate according to the search space;

10. The convolution model lightweight system of claim 7, wherein the pruning operation module is further configured to: