CN110232436A

CN110232436A - Pruning method, device and the storage medium of convolutional neural networks

Info

Publication number: CN110232436A
Application number: CN201910380839.5A
Authority: CN
Inventors: 刘传建; 王云鹤; 韩凯
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-05-08
Filing date: 2019-05-08
Publication date: 2019-09-13

Abstract

This application discloses a kind of pruning method of convolutional neural networks, device and storage medium, belong to the nerual network technique field in artificial intelligence field in technical field of computer vision.Convolutional neural networks include multiple layers, multiple layers include one or more convolutional layers, this method comprises: obtaining the feature set of graphs of the n-th convolutional layer output of convolutional neural networks, feature set of graphs includes multiple characteristic patterns, n-th convolutional layer is any of one or more convolutional layers, and n is positive integer；Determine the important feature figure and insignificant characteristic pattern in feature set of graphs；Convolutional neural networks are trimmed based on insignificant characteristic pattern, obtain next layer of input of the n-th convolutional layer, next layer of input of the n-th convolutional layer includes important feature figure.By being distinguished to important characteristic pattern and insignificant characteristic pattern, and only reduce resource consumption using important feature figure as next layer of input when executing deep learning task using convolutional neural networks.

Description

Pruning method and device for convolutional neural network and storage medium

Technical Field

The present disclosure relates to the field of neural network technologies, and in particular, to a method and an apparatus for pruning a convolutional neural network, and a storage medium.

Background

Neural networks have become more advanced technologies that address deep learning tasks such as computer vision, speech recognition, and natural language processing. Nevertheless, neural network algorithms are computationally and memory intensive, which makes it difficult to deploy onto devices with limited hardware resources.

To address this limitation, deep compression may be used to significantly reduce the computational and memory requirements needed for neural networks. For example, for a convolutional neural network with fully connected layers, deep compression may reduce the model size by several times, for example, several tens of times. Convolutional Neural Network (CNN) based depth compression includes several different techniques: quantization compression, pruning, lightweight network design and knowledge extraction.

The pruning refers to pruning neurons or synapses in the convolutional neural network after the training of the convolutional neural network is completed, so that the neural network model is compressed. But information of neurons or synapses cut out by this method is permanently lost and the resulting model is fixed.

Disclosure of Invention

The application provides a pruning method and a pruning device for a convolutional neural network and a storage medium, which can reduce the calculated amount of the convolutional neural network, avoid the loss of information and simultaneously realize the difference of different samples.

In a first aspect, at least one embodiment of the present application provides a pruning method for a convolutional neural network, the convolutional neural network comprising a plurality of layers, the plurality of layers comprising one or more convolutional layers, the method comprising:

obtaining a feature map set output by an nth convolutional layer of the convolutional neural network, wherein the feature map set comprises a plurality of feature maps, the nth convolutional layer is any one of the one or more convolutional layers, and n is a positive integer;

determining an important feature map and an unimportant feature map in the feature map set, wherein the influence of the important feature map on an output result of the convolutional neural network is greater than that of the unimportant feature map;

pruning the convolutional neural network based on the unimportant feature map to obtain the input of the next layer of the nth convolutional layer, wherein the input of the next layer of the nth convolutional layer comprises the important feature map.

In the embodiment of the application, in the working process of the convolutional neural network, by distinguishing the important feature map and the non-important feature map in the feature map output by the convolutional layer, when the feature map is output to the next layer of neural network, only the important feature map is output, and the non-important feature map has little influence on the output result of the convolutional neural network, so that the effect on the output result is little or even no influence on the output result, and the calculation can be greatly saved. Meanwhile, the scheme is operated in the using process of the neural network, so that the structure of the neural network model is not influenced, and the information is not lost. Moreover, since the present application determines whether the feature map is important, and the feature map generated by the neural network is different for different samples, the determined important feature map and the determined non-important feature map may be different for different samples, so that the feature map trimmed by the scheme is different for different samples, that is, different samples.

Optionally, the determining the important feature map and the non-important feature map in the feature map set includes:

calculating an L2 norm of each feature map of the nth convolutional layer;

if the L2 norm of the first feature map is smaller than the first threshold value of the nth convolutional layer, determining that the first feature map is the insignificant feature map; and if the L2 norm of the first feature map is greater than or equal to the first threshold, determining that the first feature map is the important feature map, wherein the first feature map is any feature map in the feature map set.

In this implementation, the significance of the feature map may be determined by comparing the L2 norm of the feature map to the magnitude of the threshold. Specifically, in the convolution operation, the larger the numerical value of each feature in the feature map is, the greater the influence of the numerical value on the output result of the convolutional neural network is, and the more important the numerical value is; since the norm L2 is the root mean square of the sum of squares of the features in the feature map, the importance of each feature map can be determined by the magnitude of the norm L2.

Optionally, the first threshold of the nth convolutional layer is a product of a norm mean of the nth convolutional layer and a threshold coefficient, and the norm mean is a mean of L2 norms of all feature maps of the nth convolutional layer.

In this implementation, the norm mean is used as a basis for the first threshold, and the importance of the feature map is determined by comparing the norm of each feature map with the magnitude of the mean. Meanwhile, the threshold coefficient is set, so that the size of the threshold can be adjusted as required, and the trimming proportion of the characteristic diagram can be adjusted.

Optionally, before determining the important feature map and the non-important feature map in the feature map set, the determining the important feature map and the non-important feature map in the feature map set further includes:

determining whether the nth convolutional layer is a pruneable convolutional layer;

and if the nth convolutional layer is a pruneable convolutional layer, determining an important feature map and a non-important feature map in the feature map set.

In the implementation mode, before judging whether each characteristic diagram is an important characteristic diagram, whether one layer of characteristic diagram needs to be pruned is judged, so that unreasonable pruning is avoided when the importance degree of one layer of characteristic diagram is relative, and the accuracy of the pruned convolutional neural network is improved.

Optionally, the determining whether the nth convolutional layer is a pruneable convolutional layer comprises:

calculating the coefficient of variation of the L2 norm of all feature maps of the nth convolutional layer;

determining the nth convolutional layer as the pruneable convolutional layer if the coefficients of variation of the L2 norms of all the feature maps of the nth convolutional layer are greater than a second threshold.

Here, the determination of whether a layer of feature map needs to be pruned is implemented by the coefficient of variation. The variation coefficient represents the dispersion degree of a group of data, the variation coefficient of the L2 norm of all feature maps of the nth convolutional layer represents the dispersion degree of the L2 norm of each feature map of the nth convolutional layer, and the dispersion condition of the importance degree of each feature map can be determined by calculating the variation coefficient of the L2 norm of one layer of feature map, so that whether the pruning is needed or not can be judged.

Optionally, the pruning the convolutional neural network based on the insignificant feature map to obtain an input of a next layer of the nth convolutional layer includes:

setting all the features of the non-important feature map to be 0; taking the non-significant feature map and the significant feature map after being set to 0 as the input of the next layer of the nth convolutional layer;

or, the pruning the convolutional neural network based on the insignificant feature map to obtain an input of a next layer of the nth convolutional layer includes:

and screening out non-important feature maps from the feature map set, and using the important feature maps as the input of the next layer of the nth convolutional layer.

In this implementation, two ways of pruning the insignificant feature maps are provided, one is to set the insignificant feature maps to 0 and then output to the next layer; the other is to output only the important feature map to the next layer. The two methods can realize the pruning of the non-important characteristic diagram and reduce the calculation amount.

Optionally, the method further comprises:

training the convolutional neural network, wherein a loss function used in the training of the convolutional neural network comprises L2, a regularization term of 1 norm, and the regularization term is used for learning sparse characteristics of samples.

In this implementation, training of the convolutional neural network is performed by adding an L2,1 norm regularization term to the loss function. Since the regularization term of L2,1 norm, is used to learn the sparse features of the sample (i.e. to learn the importance of the features), training with the loss function including the regularization term can generate a sparse neural network model, which can be used to select important features in subsequent use.

Optionally, the regularization term of the L2,1 norm includes a product of an L2,1 norm and a regularization coefficient, the L2,1 norm is a sum of L2,1 norms of respective samples, where the L2,1 norm of a first sample is a sum of L2 norms of feature maps of all convolutional layers of the convolutional neural network after the first sample is input, and the first sample is any one of all samples.

In this implementation, the regularization term of L2,1 norm is the sum of L2,1 norms of the various samples, so that the convolutional neural network trained by the loss function including the regularization term can learn the sparse features of the various samples. When the convolutional neural network is used subsequently, the selection of the important features can be realized for different samples.

In a second aspect, at least one embodiment of the present application provides a pruning apparatus for a convolutional neural network, the convolutional neural network comprising a plurality of layers, the plurality of layers comprising one or more convolutional layers, the apparatus comprising:

a computing unit configured to obtain a feature map set output by an nth convolutional layer of the convolutional neural network, the feature map set including a plurality of feature maps, the nth convolutional layer being any one of the one or more convolutional layers, n being a positive integer;

a determining unit configured to determine an important feature map and an unimportant feature map in the feature map set, wherein the influence of the important feature map on an output result of the convolutional neural network is greater than that of the unimportant feature map;

a processing unit configured to prune the convolutional neural network based on the insignificant feature map, resulting in an input of a next layer of the nth convolutional layer, the input of the next layer of the nth convolutional layer including the significant feature map.

Optionally, the determining unit includes:

a calculation subunit configured to calculate an L2 norm of each feature map of the nth convolutional layer;

a determining subunit configured to determine that a first feature map is the insignificant feature map if the L2 norm of the first feature map is smaller than a first threshold of the nth convolutional layer; and if the L2 norm of the first feature map is greater than or equal to the first threshold, determining that the first feature map is the important feature map, wherein the first feature map is any feature map in the feature map set.

Optionally, the determining subunit is further configured to determine whether the nth convolutional layer is a pruneable convolutional layer before the determining of the significant feature map and the non-significant feature map in the feature map set; and if the nth convolutional layer is a pruneable convolutional layer, determining an important feature map and a non-important feature map in the feature map set.

Optionally, the calculating subunit is further configured to calculate a coefficient of variation of the L2 norm of all feature maps of the nth convolution layer;

the determining subunit is configured to determine that the nth convolutional layer is the pruneable convolutional layer if the coefficient of variation is greater than a second threshold.

Optionally, the processing unit is configured to set all the features of the non-significant feature map to 0; taking the non-significant feature map and the significant feature map after being set to 0 as the input of the next layer of the nth convolutional layer;

or, the processing unit is configured to filter out non-significant feature maps from the feature map set, and use the significant feature maps as an input of a layer next to the nth convolutional layer.

Optionally, the apparatus further comprises:

a training unit configured to train the convolutional neural network, wherein a loss function used in training the convolutional neural network comprises L2, a regularization term of 1 norm, and the regularization term is used for learning sparse features of samples.

In a third aspect, at least one embodiment of the present application provides a pruning device for a convolutional neural network, including a processor and a memory; the memory is used for storing software programs and modules, and the processor implements the method in any one of the possible embodiments of the first aspect by running or executing the software programs and/or modules stored in the memory.

Optionally, the number of the processors is one or more, and the number of the memories is one or more.

Alternatively, the memory may be integral to the processor or provided separately from the processor.

In a specific implementation process, the memory may be a non-transient memory, such as a Read Only Memory (ROM), which may be integrated on the same chip as the processor, or may be separately disposed on different chips.

In a fourth aspect, at least one embodiment of the present application provides a computer program (product) comprising: computer program code which, when run by a computer, causes the computer to perform the method of any of the possible embodiments of the first aspect described above.

In a fifth aspect, at least one embodiment of the present application provides a computer-readable storage medium for storing program code executed by a processor, the program code including instructions for implementing the method in any one of the possible implementations of the first aspect.

In a sixth aspect, a chip is provided, which includes a processor, and the processor is configured to invoke and execute instructions stored in a memory, so that a communication device in which the chip is installed executes the method in any one of the possible implementation manners of the first aspect.

In a seventh aspect, another chip is provided, including: an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected by an internal connection path, the processor is configured to execute code in the memory, and when the code is executed, the processor is configured to perform the method in any possible implementation manner of the first aspect.

Drawings

Fig. 1 is a schematic structural diagram of a convolutional neural network provided in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a processing apparatus provided in an embodiment of the present application;

fig. 3 is a flowchart of a pruning method for a convolutional neural network according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of another convolutional neural network pruning method provided by the present application;

FIG. 5 is a schematic diagram of a feature map pruning process provided by an embodiment of the present application;

FIG. 6 is a flow chart of another convolutional neural network pruning method provided by the present application;

FIG. 7 is a flowchart of a convolutional neural network training method provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of a training process of a convolutional neural network provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a pruning device of a convolutional neural network according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

To facilitate understanding of the technical solutions provided in the embodiments of the present application, an application scenario of the present application is first introduced.

In this scenario, a processing device is included, and in an application scenario of the present application, a convolutional neural network may be stored and executed in the processing device.

Processing devices to which the present application relates may include handheld devices, in-vehicle devices, wearable devices, computing devices or other devices connected to a wireless modem, as well as cloud devices, terminals (terminal), terminal devices (terminal equipment), monitoring devices, servers, and the like. For convenience of description, in the present application, it is simply referred to as a processing apparatus.

The convolutional neural network comprises a plurality of layers including one or more convolutional layers, including but not limited to a ReLu (modified linear unit) layer, a pooling layer, a fully-connected layer, and the like, in addition to the convolutional layers. Each convolutional layer includes a plurality of filters (or convolutional kernels), each filter is an array, where the numbers are referred to as weights or parameters. The convolutional layer of the convolutional neural network functions to perform a convolution operation on an input, for example, a filter in the first convolutional layer slides on a sample by a set step size, at each sliding position, an array of the filter is multiplied by the sample data and added to obtain a value, all values obtained during the sliding process are combined into a new array, the new array is called a feature map (featuremap), each value in the feature map is a feature of the feature map, a plurality of filters in each convolutional layer are convolved to obtain a plurality of feature maps, and the plurality of feature maps form a layer of feature map.

Fig. 1 is a schematic diagram of a convolutional neural network. Referring to fig. 1, the convolutional neural network comprises convolutional layers and pooling layers which are alternately arranged, such as convolutional layer 1, pooling layer 1, convolutional layer 2, pooling layer 2 … … shown in fig. 1, and fully-connected layer 1, fully-connected layer 2 and softmax layer which are connected to the last pooling layer, wherein the output of the softmax layer is the output of the convolutional neural network.

Fig. 2 is a schematic diagram of a possible hardware structure of the processing device. As shown in fig. 2, the processing device includes a processor 10, a memory 20, and a communication interface 30. Those skilled in the art will appreciate that the configuration shown in fig. 2 does not constitute a limitation of the processing apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 10 is a control center of the processing apparatus, connects various parts of the entire processing apparatus using various interfaces and lines, and performs various functions of the processing apparatus and processes data by running or executing software programs and/or modules stored in the memory 20 and calling data stored in the memory 20, thereby performing overall control of the processing apparatus. The processor 10 may be a CPU, other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be an advanced reduced instruction set machine (ARM) architecture supported processor.

The memory 20 may be used to store software programs and modules. The processor 10 executes various functional applications and data processing by executing software programs and modules stored in the memory 20. The memory 20 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system 21, an obtaining module 22, a determining module 23, a processing module 24, and one or more application programs 25 (such as determining the importance of a feature map) required by functions, and the like; the storage data area may store data (such as a feature map of a convolutional layer output, etc.) created according to the use of the UE or the target server, and the like. The memory 20 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. For example, Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous SDRAM (ddr SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct memory bus RAM (DR RAM). Accordingly, the memory 20 may also include a memory controller to provide the processor 10 access to the memory 20.

Wherein, the processor 20 executes the following functions by operating the obtaining module 22: obtaining a feature map set output by an nth convolutional layer of the convolutional neural network, wherein the feature map set comprises a plurality of feature maps, the nth convolutional layer is any one of the one or more convolutional layers, and n is a positive integer; the processor 20 performs the following functions by running the determination module 23: determining an important feature map and an unimportant feature map in the feature map set, wherein the influence of the important feature map on an output result of the convolutional neural network is greater than that of the unimportant feature map; processor 20 performs the following functions by executing processing module 24: pruning the convolutional neural network based on the unimportant feature map to obtain the input of the next layer of the nth convolutional layer, wherein the input of the next layer of the nth convolutional layer comprises the important feature map.

The embodiment of the present application further provides a chip, which includes a processor, and the processor is configured to call and execute the instructions stored in the memory from the memory, so that the communication device in which the chip is installed executes any one of the methods for pruning a convolutional neural network provided in the present application.

An embodiment of the present application further provides a chip, including: the convolutional neural network pruning device comprises an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing any one of the convolutional neural network pruning methods.

It should be understood that the processor may be a CPU, but may also be other general purpose processors, DSPs, ASICs, FPGAs, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is worth noting that the processor may be a processor supporting an ARM architecture.

Further, in an optional embodiment, the number of the processors is one or more, and the number of the memories is one or more. Alternatively, the memory may be integrated with the processor, or provided separately from the processor. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The memory may also include non-volatile random access memory. For example, the memory may also store device type information.

The memory may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a ROM, PROM, EPROM, EEPROM, or flash memory, among others. Volatile memory can be RAM, which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. Such as SRAM, DRAM, SDRAM, DDR SDRAM, ESDRAM, SLDRAM, and DR RAM.

The present application provides a computer program which, when executed by a computer, may cause the processor or the computer to perform the respective steps and/or procedures corresponding to the method embodiments of pruning of a convolutional neural network of any one of the applications provided herein.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid state disk), among others.

Fig. 3 is a flowchart of a pruning method for a convolutional neural network according to an embodiment of the present disclosure. The method may be performed by a processing device in the aforementioned application scenario, as shown in fig. 3, and includes the following steps.

Step S31: obtaining a feature map set output by an nth convolutional layer of the convolutional neural network, wherein the feature map set comprises a plurality of feature maps, the nth convolutional layer is any one of the one or more convolutional layers, and n is a positive integer.

As described above, the convolutional layer of the convolutional neural network obtains a feature map by a convolution operation. The convolutional neural network includes one or more convolutional layers, where the convolutional neural network includes a plurality of convolutional layers, the nth convolutional layer may be any layer of the convolutional neural network. In addition, the pruning method for the convolutional neural network provided by the embodiment of the present application may be executed on part of convolutional layers of the convolutional neural network, or may be executed on all convolutional layers of the convolutional neural network. When the partial convolutional layer is executed, the partial convolutional layer is any partial convolutional layer in the convolutional neural network, and the convolutional layers may be adjacent convolutional layers or convolutional layers arranged at intervals.

Step S32: and determining an important feature map and a non-important feature map in the feature map set.

In the embodiment of the present application, the feature map is divided into an important feature map and a non-important feature map, and the importance degree of each feature map is distinguished. The important feature map and the non-important feature map are relative, and the important feature map has a greater importance degree than the non-important feature map. The important characteristic diagram refers to a characteristic diagram which has a large effect (has a large effect on the result accuracy rate) when the convolutional neural network executes the deep learning task, namely has a large effect on the output result of the convolutional neural network, and the non-important characteristic diagram refers to a characteristic diagram which has a small effect (has a small effect on the result accuracy rate) when the convolutional neural network executes the deep learning task, namely has a small effect on the output result of the convolutional neural network. The important characteristic diagram has a larger effect than the non-important characteristic diagram when the convolutional neural network executes the deep learning task, and the influence of the important characteristic diagram on the output result of the convolutional neural network is larger than that of the non-important characteristic diagram.

The deep learning task is to process data (samples) with a set target by using a convolutional neural network, where the target includes, but is not limited to, classification, target detection, and the like, and the data includes, but is not limited to, text, image, voice, and the like. The convolutional neural network provided by the present application can perform deep learning tasks including, but not limited to, various tasks in a computing scenario using a depth model, such as a computer vision task, a speech recognition task, a natural language processing task, and the like, which specifically may include image or text classification, object detection, semantic segmentation, and the like. When the above deep learning task is executed, the convolutional neural network outputs a result, that is, a result of processing data (sample) with a set target, for example, a target position in the target detection task, an image type in the image classification task, and the like.

Illustratively, when the convolutional neural network is used for target detection, the important characteristic diagram refers to a characteristic diagram which has a large effect on target detection results output by the convolutional neural network when the convolutional neural network is used for target detection, and the non-important characteristic diagram refers to a characteristic diagram which has a small effect on target detection results output by the convolutional neural network when the convolutional neural network is used for target detection.

The importance of the feature map depends on the input sample and the problem to be solved by the deep learning task. Therefore, the important feature map may be a feature map capable of presenting the features of the sample data itself, or a feature map having features highly correlated with the target of the deep learning task.

In the embodiment of the application, when a user uses a mobile phone to take a picture, the user can automatically grab the targets such as human faces and animals through target detection, and the mobile phone can be helped to automatically focus, beautify and the like. However, as the processing capacity of the mobile phone is relatively weak, the target detection process is executed by adopting the scheme of the application, so that the calculation amount of the convolutional neural network during target detection can be greatly reduced, the quality of mobile phone products can be improved, and better user experience is brought to users.

Step S33: pruning the convolutional neural network based on the unimportant feature map to obtain the input of the next layer of the nth convolutional layer, wherein the input of the next layer of the nth convolutional layer comprises the important feature map.

The non-important feature maps have small effect when the convolutional neural network executes the deep learning task, so that the influence of discarding the feature maps on the output result in the subsequent calculation is small or even does not influence the output result, and in the subsequent calculation process, the discarded feature maps do not need to be processed by spending calculation resources, so that the calculation can be greatly saved.

For example, if the next layer of the nth convolutional layer is a pooling layer, the important feature map in the feature map of the nth convolutional layer is used as the input of the pooling layer, and the pooling layer completes the pooling operation only according to the important feature map of the nth convolutional layer.

In addition, the pruning method of the convolutional neural network provided by the application can be performed on the convolutional neural network after the depth compression and the convolutional neural network without the depth compression, for example, the pruning method of the convolutional neural network after the pruning operation can be performed on the convolutional neural network, so that the calculation amount is further reduced.

In the embodiment of the present application, steps S31 to S33 may be performed by the processing device, and the processing device controls the feature map output by the convolutional layer to the next layer in the work process of the convolutional neural network, so as to control the calculation amount of the convolutional neural network. The convolutional neural network trimmed by the steps of the method finally outputs the result of executing the deep learning task.

Fig. 4 is a flowchart of a pruning method for a convolutional neural network according to an embodiment of the present disclosure. The method may be performed by a processing device in the aforementioned application scenario, as shown in fig. 4, and includes the following steps.

Step S41: obtaining a feature map set output by an nth convolutional layer of the convolutional neural network, wherein the feature map set comprises a plurality of feature maps, the nth convolutional layer is any one of the one or more convolutional layers, and n is a positive integer.

Step S42: calculating an L2 norm for each feature map of the nth convolutional layer.

The L2 norm of the feature map is the square root of the sum of the squares of the individual features in the feature map. The application adopts the L2 norm of each feature map as a statistic to measure the importance degree of each feature map. In the convolution operation, the larger the value of each feature in the feature map, the more important the influence thereof on the output result of the convolutional neural network, and the L2 norm is the root mean square of the sum of the squares of the features, so the importance of each feature map can be judged by the size of the L2 norm.

In this step, the L2 norm is calculated as follows:

in formula (1), m represents the m-th sample, n represents the n-th convolutional layer, and cn represents the cn-th feature map of the n-th convolutional layer.The L2 norm of the cn feature map of the nth convolutional layer is shown when the mth sample is input into the convolutional neural network. Hn and Wn are eachHeight and width of the nth convolutional layer feature map, (hn, ω n) denotes the location of the feature in the feature map, F is the feature,when the mth sample is input to the convolutional neural network, the position of the nth feature map of the nth convolutional layer is (hn, ω n).

The L2 norm of each feature map in the nth convolutional layer is calculated one by the formula.

Step S43: if the L2 norm of the first feature map is smaller than the first threshold value of the nth convolutional layer, determining that the first feature map is the insignificant feature map; and if the L2 norm of the first feature map is greater than or equal to the first threshold, determining that the first feature map is the important feature map, wherein the first feature map is any feature map in the feature map set. Wherein the influence of the important feature map on the output result of the convolutional neural network is greater than that of the non-important feature map.

That is, the magnitude of the L2 norm and the first threshold value of each feature map is determined, and it is determined whether each feature map is an insignificant feature map.

Since the feature scale of each convolutional layer in the convolutional neural network is different, the size of the first threshold of different layers is different. The first threshold may be designed based on the mean of the L2 norm of each convolutional layer. The feature size refers to the size of a feature, for example, if the feature size of one convolution layer is between 100-200 and the feature size of another convolution layer is between 0.1-0.2, the feature sizes of the two convolution layers are different.

Illustratively, the first threshold of the nth convolutional layer is a product of a norm mean of the nth convolutional layer, which is a mean of L2 norms of all feature maps of the nth convolutional layer, and a threshold coefficient.

Wherein the threshold coefficient is greater than or equal to 0 and less than 2. Here, the threshold coefficient takes a value of [0, 2 ]. Since in the first threshold the part multiplied by the threshold coefficient is the mean of the L2 norm, while twice the mean of the L2 norm is almost equal to the L2 norm maximum. Therefore, the threshold coefficient is selected to take value between [0, 2), so that various conditions such as dividing all feature maps into important feature maps, dividing part of feature maps into important feature maps, dividing all feature maps into non-important feature maps and the like can be realized.

Specifically, the threshold coefficient can be valued in the value range as required, and the larger the value of the threshold coefficient is, the more the non-important feature maps obtained by division are, the lower the accuracy is, and the smaller the calculation amount is; the smaller the value of the threshold coefficient is, the less the non-important feature maps obtained by division are, the higher the accuracy is, and the larger the calculation amount is.

Wherein, the first threshold value can be calculated by adopting the following formula:

in equation (2), β represents a threshold coefficient, μ_nDenotes the mean of L2 norms of the nth convolutional layer, and Cn denotes the total number of characteristic maps of the nth convolutional layer.

Step S44: pruning the convolutional neural network based on the unimportant feature map to obtain the input of the next layer of the nth convolutional layer, wherein the input of the next layer of the nth convolutional layer comprises the important feature map.

Here, pruning the convolutional neural network includes structured pruning and unstructured pruning. The method adopts a structured pruning mode, wherein the structured pruning means directly removing a feature map, a layer or a convolution kernel and the like, and is favorable for accelerating implementation on hardware.

The structured pruning method provided by the application comprises the following two methods, wherein the two methods are used for removing the non-important characteristic diagram:

first, the steps may include: setting all the features of the non-important feature map to be 0; and taking the non-important characteristic diagram and the important characteristic diagram after being set to 0 as the input of the next layer of the nth convolution layer.

Second, the step may include: and screening out non-important feature maps from the feature map set, and using the important feature maps as the input of the next layer of the nth convolutional layer.

The second mode is also to control which characteristic diagrams are transmitted to the next layer, which characteristic diagrams are not transmitted to the next layer, the application controls the important characteristic diagrams to be transmitted to the next layer, and the non-important characteristic diagrams are not transmitted to the next layer. For example, the weight of the connection corresponding to the unimportant feature map may be set to 0, so that the unimportant feature map is not output to the next layer of the convolutional neural network.

The next layer of the convolutional neural network is the next layer of the nth convolutional layer in the neural network, and the layer may be a ReLu layer, a pooling layer, or the like, or may be a convolutional layer.

Fig. 5 is a schematic diagram of a feature map pruning process according to an embodiment of the present application, where Fn represents a feature map of an nth convolutional layer, the nth convolutional layer includes a plurality of feature maps Fn, and after calculating a Norm (Norm) of L2, an important feature map and an unimportant feature map are determined. And (3) pruning the non-important feature map, for example, setting the non-important feature map to be 0, namely, a white square after pruning, and outputting the feature map at the moment to the next layer Lx, wherein the Lx can be a convolutional layer, a ReLu, a pooling layer and the like.

In the present embodiment, the steps of S41-S44 may be performed for each convolutional layer of the convolutional neural network. This implementation is briefly described below in conjunction with fig. 1: after the convolutional layer 1 inputs a sample, the convolutional layer 1 performs convolution operation on the sample to obtain a feature map set of the convolutional layer 1; the method steps of steps S41-S44 are performed on the feature map set of the convolutional layer 1; then, performing pooling operation on the important feature map of the convolutional layer 1 output in the steps S41-S44 by the pooling layer 1, and outputting the result to the convolutional layer 2; the convolutional layer 2 performs convolutional layer operation on the result of the pooling layer 1 to obtain a characteristic diagram set of the convolutional layer 2; the method steps of steps S41-S44 are performed on the feature map set of the convolutional layer 2; then, performing pooling operation on the important feature map of the convolutional layer 2 output in the steps S41-S44 by the pooling layer 2, and outputting the result to the convolutional layer 3; and so on until all feature maps output by the convolution layers are subjected to the pruning. And after all the convolutional layers and the pooling layers are processed, outputting the results to the fully-connected layer 1, the fully-connected layer 2 and the softmax layer for processing, and finally outputting the output result of the convolutional neural network by the softmax layer. For example, when the convolutional neural network performs face detection, the output result is the position of the face on the image; for another example, when the convolutional neural network performs picture classification, the output result is the picture classification result.

The following describes the amount of computation of the method provided in the present application, taking the pruning method of the convolutional neural network shown in fig. 4 as an example.

The calculated amount of the n +1 th convolutional layer convolution calculation in the standard convolution is: cn × Cn +1 × Hn × Wn × k × k; cn is the number of characteristic graphs output by the nth convolutional layer, namely the input number of the (n + 1) th convolutional layer; hn × Wn is the size of the characteristic diagram output by the nth convolution layer; cn +1 is the number of characteristic maps of the output of the (n + 1) th convolutional layer; k × k is the size of the convolution kernel (filter).

While the amount of computation of the convolution calculation after the method shown in fig. 4 includes two parts:

calculation amount of L2 norm of nth convolution layer: cn is multiplied by Hn is multiplied by Wn;

the calculated amount of the convolution calculation of the (n + 1) th convolution layer after the C unimportant feature maps are trimmed off is as follows:

wherein Cn-C is the nth convolution layerThe number of output feature maps, i.e., the input number of the (n + 1) th convolutional layers;

the total calculated amount after the method shown in fig. 4 is adopted is: cn Hn Wn + (Cn-C) Cn +1 Hn Wn K, the amount of computation is greatly reduced compared to standard convolution calculations.

Fig. 6 is a flowchart of a pruning method for a convolutional neural network according to an embodiment of the present application, and as shown in fig. 6, the method is different from the method provided in fig. 4 in a manner of determining an important feature map and a non-important feature map, and includes the following steps:

step S51: obtaining a feature map set output by an nth convolutional layer of the convolutional neural network, wherein the feature map set comprises a plurality of feature maps, the nth convolutional layer is any one of the one or more convolutional layers, and n is a positive integer.

Specifically, step S51 may be the same as step S41.

Step S52: calculating an L2 norm for each feature map of the nth convolutional layer.

Specifically, step S52 may be the same as step S42.

Step S53: determining whether the nth convolutional layer is a pruneable convolutional layer.

Specifically, the step may include:

first, the coefficients of variation of the L2 norm of all feature maps of the nth convolutional layer are calculated.

The coefficient of variation is a ratio of the mean square error to the mean of a plurality of data, and is a quantity for analyzing the degree of dispersion of the data. For the L2 norm of all the feature maps of a convolutional layer, the degree of dispersion of the L2 norm of these feature maps can be analyzed by calculating the coefficient of variation. If the coefficient of variation is larger, the degree of dispersion of the L2 norm of the characteristic diagram is larger, the importance of the characteristic diagram is more dispersed, the action sizes of different characteristic diagrams in the subsequent execution of the task of the convolutional neural network are different, and at the moment, the calculation amount can be reduced by pruning off the non-important characteristic diagrams. If the coefficient of variation is small, the degree of dispersion of the L2 norm of the feature map is small, the more dense the feature map is, the more significant the different feature maps have the same effect in the subsequent task of executing the convolutional neural network, and at this time, it is difficult to perform pruning on the premise of ensuring accuracy.

The calculation formula of the coefficient of variation of the L2 norm of all the feature maps of the nth convolution layer is as follows:

in the formula (3), CV_nAnd (3) a coefficient of variation representing the L2 norm of all the characteristic diagrams of the nth convolution layer.

Second, if the coefficient of variation is greater than a second threshold, determining the nth convolutional layer to be the pruneable convolutional layer.

Since the coefficient of variation represents the degree of dispersion of the data, regardless of the characteristic scale of each convolutional layer, the same second threshold α may be used for different convolutional layers.

The value range of the second threshold α may be [0, 2 ], here, the second threshold α is also a threshold that can be taken as needed, and the number of the finally clipped unimportant feature maps can be changed by taking the condition α as the value of the second threshold α is smaller, for example, the more the determined clipable convolutional layers are, the more the finally clipped unimportant feature maps are likely to be, the larger the value of the second threshold α is, the less the finally clipped unimportant feature maps are likely to be, and the less the finally clipped unimportant feature maps are likely to be.

Determining that the nth convolutional layer is not the pruneable convolutional layer if the coefficient of variation is less than or equal to the second threshold.

If the nth convolutional layer is not a pruneable convolutional layer, the feature map of the convolutional layer is not pruned, and in this case, step S54 is not executed, and all feature maps of the nth convolutional layer are directly output to the next layer of the convolutional neural network as input of the next layer of the convolutional neural network.

Step S54: if the L2 norm of the first feature map is smaller than the first threshold value of the nth convolutional layer, determining that the first feature map is the insignificant feature map; and if the L2 norm of the first feature map is greater than or equal to the first threshold, determining that the first feature map is the important feature map, wherein the first feature map is any feature map in the feature map set.

Specifically, step S54 may be the same as step S43.

Step S55: pruning the convolutional neural network based on the unimportant feature map to obtain the input of the next layer of the nth convolutional layer, wherein the input of the next layer of the nth convolutional layer comprises the important feature map.

Specifically, step S55 may be the same as step S44.

Fig. 7 is a flowchart of a convolutional neural network training method provided in an embodiment of the present application, and as shown in fig. 7, the method may be performed before the pruning method of the convolutional neural network provided in any one of fig. 3 to 6, and the method includes the following steps:

step S61: training samples are obtained.

Where the training sample comprises a plurality of samples.

Step S62: and training the convolutional neural network by using the samples, wherein a loss function used in the process of training the convolutional neural network comprises L2, a regularization term of 1 norm, and the regularization term is used for learning sparse features of the samples.

The training method of the convolutional neural network provided by the application is the same as the conventional training method, and only the loss function is changed, wherein the conventional training method can be a back propagation algorithm and the like.

The regularization term is a penalty term of the loss function, and the weight of the neural network can be constrained by adding the regularization term.

The L2,1 norm regularization term includes a product of L2,1 norm and regularization coefficient, the L2,1 norm is a sum of L2,1 norms of each sample, wherein the L2,1 norm of a first sample is a sum of L2 norms of feature maps of all convolutional layers of the convolutional neural network after the first sample is input, and the first sample is any one of all samples.

The L2,1 norm of the first sample is also the L1 norm of the L2 norm of all feature maps of the convolutional neural network. The L1 norm regularization L1 regularization is used for generating a sparse weight matrix, namely, a sparse model for feature selection. That is, the convolutional neural network becomes a sparse model by adding the regularization term of the L1 norm to the loss function of the convolutional neural network. When the product neural network is subsequently used, feature selection may be performed by performing the method illustrated in any one of fig. 3-6.

In the embodiment of the present application, the formula of the loss function can be expressed as follows:

in equation (4), Ltask is a loss function, which has different forms for different tasks, e.g., for image classification, Ltask is a cross entropy loss function; gamma · R (ω) is a conventional regularization term, gamma is a custom parameter, R (ω) is typically a L2 norm, which may prevent overfitting;which is newly added to the present application as L2, a regularization term of 1 norm, λ is a regularization coefficient,is L2,1 norm, where N_mL2 for the mth sample, 1 norm, M being the total number of samples.

Wherein,wherein N is the total number of layers of the convolution layer.For the L2 norm of a feature map, see step S42 for a specific calculation manner.

Fig. 8 is a schematic diagram of a training process of a convolutional neural network according to an embodiment of the present disclosure. Referring to fig. 8, during training, the L2 norm of each feature map of each convolutional layer is calculated, where Fn and Fn +1 represent the feature maps of the nth convolutional layer and the n +1 th convolutional layer, respectively, and each convolutional layer includes a plurality of feature maps Fn or Fn + 1. And calculating an L1 norm (L2 and 1 norm) based on the L2 norms of each layer, and adding a regularization term of the L2 and 1 norm into the Loss function Loss to finish the training of the convolutional neural network. In fig. 8, Lx may be a convolutional layer, ReLu, a pooling layer, or the like.

The accuracy of the method provided by the present application is illustrated below with reference to tables 1 to 4: as shown in table 1, table 2, table 3 and table 4, the pruning method using the convolutional neural network provided in the present application can still prune a higher proportion of feature maps on the CIFAR-10 dataset while ensuring that the accuracy rate is not substantially reduced. The method works even on already compressed networks and lightweight networks.

TABLE 1 accuracy when using random pruning (rand) and pruning at minimum (min) on a trained convolutional neural network VGG16

In Table 1, pr (10%, 20%, 30%) represents the proportion of pruning at random or at the minimum, respectively. λ (0, 1e-6, 1e-7, 1e-8) is a regularization coefficient of L2, a regularization term of 1 norm newly added in the loss function, and when λ is 0, it corresponds to a convolutional neural network obtained by conventional training. The three digit decimal value in the table (e.g., 0.673) represents accuracy. It can be seen that the accuracy rate of the convolutional neural network model obtained by training with the method of the present application is superior to that of the convolutional neural network obtained by conventional training when random pruning or pruning according to the minimum value is performed.

Table 2 pruning rate and accuracy for pruning on VGG16 using the method of the present application

In table 2, thresh (0.5, 1.0, and 1.0) respectively represents the second threshold α and the threshold coefficient β in the method provided by the present application, λ (0, 1e-6, 1e-7, and 1e-8) is a regularization coefficient of a regularization term of L2,1 norm newly added in the loss function, a numerical value of three decimal numbers corresponding to pr (e.g., 0.469) in the table represents a pruning rate, and a numerical value of three decimal numbers corresponding to acc (e.g., 0.934) represents an accuracy.

TABLE 3 pruning Rate and accuracy for pruning on compressed VGG16 using the method of the present application

TABLE 4 pruning Rate and accuracy for pruning on a MobileNet-V2 using the method of the present application, MobileNet-V2 being a lightweight convolutional neural network

Fig. 9 is a block diagram of feature extraction provided in an embodiment of the present application. The pruning means of the convolutional neural network may be implemented as all or part of the processing device by software, hardware or a combination of both. The pruning device of the convolutional neural network may include: a calculation unit 701, a determination unit 702, and a processing unit 703.

Wherein the computing unit 701 is configured to obtain a feature map set output by an nth convolutional layer of the convolutional neural network, the feature map set includes a plurality of feature maps, the nth convolutional layer is any one of the one or more convolutional layers, and n is a positive integer; the determining unit 702 is configured to determine an important feature map and an unimportant feature map in the feature map set, wherein the important feature map has a greater influence on an output result of the convolutional neural network than the unimportant feature map; the processing unit 703 is configured to prune the convolutional neural network based on the insignificant feature map, resulting in an input of a next layer of the nth convolutional layer, which includes the significant feature map.

Optionally, the determining unit 702 includes:

a calculation subunit 721 configured to calculate an L2 norm of each feature map of the nth convolution layer;

a determining subunit 722 configured to determine that the first feature map is the insignificant feature map if the L2 norm of the first feature map is smaller than the first threshold of the nth convolutional layer; and if the L2 norm of the first feature map is greater than or equal to the first threshold, determining that the first feature map is the important feature map, wherein the first feature map is any feature map in the feature map set.

Optionally, the determining subunit 722 is further configured to determine whether the nth convolutional layer is a pruneable convolutional layer before the determining of the significant feature map and the non-significant feature map in the feature map set; and if the nth convolutional layer is a pruneable convolutional layer, determining an important feature map and a non-important feature map in the feature map set.

Optionally, the calculating subunit 721 is further configured to calculate a coefficient of variation of L2 norms of all feature maps of the nth convolutional layer;

the determining subunit 722 is configured to determine that the nth convolutional layer is the pruneable convolutional layer if the coefficient of variation is greater than a second threshold.

Optionally, the processing unit 703 is configured to set all the features of the non-significant feature map to 0; taking the non-significant feature map and the significant feature map after being set to 0 as the input of the next layer of the nth convolutional layer;

alternatively, the processing unit 703 is configured to filter out non-significant feature maps from the feature map set, and use the significant feature maps as an input of a layer next to the nth convolutional layer.

Optionally, the apparatus further comprises:

a training unit 704 configured to train the convolutional neural network, wherein a loss function used in training the convolutional neural network includes L2, a regularization term of 1 norm, and the regularization term is used for learning sparse features of samples.

It should be noted that: in the above-described embodiments, when performing pruning, the pruning apparatus for a convolutional neural network is only illustrated by dividing each functional unit, and in practical applications, the above-described function distribution may be completed by different functional units according to needs, that is, the internal structure of the apparatus is divided into different functional units, so as to complete all or part of the functions described above. In addition, the pruning device of the convolutional neural network provided by the above embodiment and the pruning method embodiment of the convolutional neural network belong to the same concept, and the specific implementation process thereof is described in detail in the method embodiment and is not described herein again.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A pruning method for a convolutional neural network, the convolutional neural network comprising a plurality of layers, the plurality of layers comprising one or more convolutional layers, the method comprising:

2. The method of claim 1, wherein the determining the significant feature map and the non-significant feature map in the feature map set comprises:

calculating an L2 norm of each feature map of the nth convolutional layer;

3. The method of claim 2, wherein the first threshold value for the nth convolutional layer is a product of a norm mean of the nth convolutional layer, which is a mean of L2 norms of all feature maps of the nth convolutional layer, and a threshold coefficient.

4. The method according to claim 2 or 3, wherein before said determining the significant feature map and the non-significant feature map in the feature map set, the method further comprises:

5. The method of claim 4, wherein the determining whether the nth convolutional layer is a pruneable convolutional layer comprises:

6. The method of any of claims 1 to 5, wherein said pruning said convolutional neural network based on said insignificant feature map to obtain an input for a next layer of said nth convolutional layer comprises:

setting all the features of the non-important feature map to be 0; and taking the non-important characteristic diagram and the important characteristic diagram after being set to 0 as the input of the next layer of the nth convolution layer.

7. The method of any of claims 1 to 5, wherein said pruning said convolutional neural network based on said insignificant feature map to obtain an input for a next layer of said nth convolutional layer comprises:

8. The method according to any one of claims 1 to 7, further comprising:

9. The method of claim 8, wherein the L2, 1-norm regularization term comprises a product of an L2, 1-norm and regularization coefficients, and the L2, 1-norm is a sum of L2, 1-norms of respective samples, wherein the L2, 1-norm of a first sample is a sum of L2-norms of feature maps of all convolutional layers of the convolutional neural network after the first sample is input, and the first sample is any one of all samples.

10. A pruning apparatus for a convolutional neural network, the convolutional neural network comprising a plurality of layers, the plurality of layers comprising one or more convolutional layers, the apparatus comprising:

11. The apparatus of claim 10, wherein the determining unit comprises:

12. The apparatus of claim 11, wherein the first threshold value for the nth convolutional layer is a product of a norm mean of the nth convolutional layer, which is a mean of L2 norms of all feature maps of the nth convolutional layer, and a threshold coefficient.

13. The apparatus according to claim 11 or 12, wherein the determining subunit is further configured to determine whether the nth convolutional layer is a pruneable convolutional layer before the determining of the significant feature map and the non-significant feature map in the feature map set; and if the nth convolutional layer is a pruneable convolutional layer, determining an important feature map and a non-important feature map in the feature map set.

14. The apparatus of claim 13, wherein the computing subunit is further configured to compute the coefficients of variation of the L2 norms of all feature maps of the nth convolutional layer;

15. The apparatus according to any one of claims 10 to 14, wherein the processing unit is configured to set all features of the non-significant feature map to 0; and taking the non-important characteristic diagram and the important characteristic diagram after being set to 0 as the input of the next layer of the nth convolution layer.

16. The apparatus according to any of claims 10 to 14, wherein the processing unit is configured to filter out non-significant feature maps from the feature map set, and use the significant feature maps as input for a layer next to the nth convolutional layer.

17. The apparatus of any one of claims 10 to 16, further comprising:

18. The apparatus of claim 17, wherein the regularization term of L2,1 norm comprises a product of L2,1 norm and regularization coefficient, and the L2,1 norm is a sum of L2,1 norm of each sample, wherein L2,1 norm of a first sample is a sum of L2 norm of feature maps of all convolutional layers of the convolutional neural network after the first sample is input, and the first sample is any one of all samples.

19. A pruning device of a convolutional neural network is characterized by comprising a processor and a memory; the memory is used for storing software programs and modules, and the processor realizes the method according to any one of claims 1 to 9 by running or executing the software programs and/or modules stored in the memory.

20. A computer-readable storage medium for storing program code for execution by a processor, the program code comprising instructions for implementing the method of any one of claims 1 to 9.