CN109858611B

CN109858611B - Neural network compression method based on channel attention mechanism and related equipment

Info

Publication number: CN109858611B
Application number: CN201910026547.1A
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-01-11
Filing date: 2019-01-11
Publication date: 2024-03-26
Anticipated expiration: 2039-01-11
Also published as: CN109858611A

Abstract

The application relates to the field of artificial intelligence, and discloses a neural network compression method and related equipment based on a channel attention mechanism, wherein the method comprises the following steps: constructing a neural network model, and establishing a channel attention mechanism in the neural network model, wherein the neural network model comprises a plurality of channels; training the neural network model; channels in the neural network model are pruned according to the channel attention mechanism during training of the neural network model. According to the method, the channel attention layer is added before the full connection layer of the neural network model, the channel weight is calculated, and channels with low weight are deleted, so that the neural network is compressed.

Description

Neural network compression method based on channel attention mechanism and related equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to a neural network compression method based on a channel attention mechanism and related equipment.

Background

The Convolutional Neural Network (CNN) consists of an INPUT layer, a convolutional layer, an activation function, a pooling layer and a full connection layer, namely an INPUT (INPUT layer) -CONV (convolutional layer) -RELU (activation function) -POOL (pooling layer) -FC (full connection layer), wherein each node of the full connection layer is connected with all nodes of the upper layer and used for integrating the characteristics extracted from the front side. The parameters of the fully connected layer are also generally the most due to their fully connected nature. Full-connection layers (FCs) play a role of a classifier in the whole convolutional neural network, and currently, due to full-connection layer parameter redundancy (only full-connection layer parameters can occupy about 80% of the whole network parameters), the operation time is long, resources are occupied greatly, and the efficiency is low. It is therefore necessary to compress the convolutional neural network.

Disclosure of Invention

Aiming at the defects of the prior art, the application aims to provide a neural network compression method and related equipment based on a channel attention mechanism, which realize the compression of the neural network by adding a channel attention layer before a full connection layer of a neural network model, calculating channel weight and deleting channels with low weight.

In order to achieve the above purpose, the technical scheme of the application provides a neural network compression method and related equipment based on a channel attention mechanism.

The application discloses a neural network compression method based on a channel attention mechanism, which comprises the following steps:

constructing a neural network model, and establishing a channel attention mechanism in the neural network model, wherein the neural network model comprises a plurality of channels;

training the neural network model;

channels in the neural network model are pruned according to the channel attention mechanism during training of the neural network model.

Preferably, the building a neural network model and building a channel attention mechanism in the neural network model, wherein the neural network model comprises a plurality of channels, and the method comprises the following steps:

constructing a neural network model, constructing a channel attention layer between a full connection layer and a convolution layer in the neural network model, and setting a softmax function in the channel attention layer, wherein each channel in the channel attention layer corresponds to each channel in the neural network model one by one;

and assigning a channel weight to each channel in the channel attention layer according to the softmax function.

Preferably, the assigning a channel weight to each channel in the channel attention layer according to the softmax function includes:

acquiring input information, and obtaining hidden layer output vectors at the current moment after the input information is subjected to convolution operation of a convolution layer in the neural network model and channel weight operation of the channel attention layer;

and calculating the similarity between the hidden layer output vector and the input information at the previous moment, inputting the similarity into the softmax function, and carrying out normalization processing to obtain the channel weight of each channel in the channel attention layer.

Preferably, the training the neural network model includes:

and (3) according to a formula for the neural network model:

training to obtain parameters W of the neural network model _ij And θ, where Y _i Representing the output of neuron i, function f represents the activation function, W _ij Representing the weight of the connection of neuron j to neuron i, θ represents the bias, X _j Representing the input of neuron j.

Preferably, the training the neural network model includes:

detecting a convergence state of a cross entropy loss function of the neural network model when the neural network model is trained;

when the convergence state of the cross entropy loss function of the neural network model is detected to be convergence, pruning of channels in the neural network model is started.

Preferably, the deleting the channel in the neural network model according to the channel attention mechanism in the training process of the neural network model includes:

presetting a channel weight threshold of a channel;

comparing the channel weight of each channel with the channel weight threshold value in the training process of the neural network model, and deleting the channels below the channel weight threshold value.

Preferably, after the channel in the neural network model is pruned according to the channel attention mechanism in the training process of the neural network model, the method includes:

presetting the lowest channel number of the neural network model;

and when the current channel number in the neural network model is not greater than the preset minimum channel number, stopping pruning.

The application also discloses a neural network compression device based on the channel attention mechanism, the device comprises:

model construction module: setting up to construct a neural network model, and establishing a channel attention mechanism in the neural network model, wherein the neural network model comprises a plurality of channels;

training module: the neural network model is set to train;

channel pruning module: is configured to prune channels in the neural network model according to the channel attention mechanism during training of the neural network model.

The application also discloses a computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of the neural network compression method described above.

The application also discloses a storage medium readable and writable by a processor, the storage medium storing computer instructions that when executed by one or more processors cause the one or more processors to perform the steps of the neural network compression method described above.

The beneficial effects of this application are: according to the method, the channel attention layer is added before the full connection layer of the neural network model, the channel weight is calculated, and channels with low weight are deleted, so that the neural network is compressed.

Drawings

Fig. 1 is a schematic flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application;

fig. 2 is a flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application;

fig. 3 is a flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application;

fig. 4 is a flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application;

fig. 5 is a flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application;

FIG. 6 is a flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a neural network compression device based on a channel attention mechanism according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

A neural network compression method flow based on a channel attention mechanism in the embodiment of the application is shown in fig. 1, and the embodiment includes the following steps:

step s101, constructing a neural network model, and establishing a channel attention mechanism in the neural network model, wherein the neural network model comprises a plurality of channels;

specifically, the neural network model may include an INPUT layer, a convolution layer, an activation function, a pooling layer, and a full connection layer, i.e., INPUT (INPUT layer) -CONV (convolution layer) -RELU (activation function) -POOL (pooling layer) -FC (full connection layer), and the neural network includes a plurality of channels connected to the full connection layer, and the channel output is used as an INPUT of the full connection layer.

Specifically, the channel attention mechanism may be implemented by establishing a channel attention layer, where the channel attention layer may be established between a fully connected layer and a convolution layer in a neural network model, and the channel attention layer may include a plurality of channels, where each channel of the channel attention layer may be in one-to-one correspondence with each channel in the neural network, that is, each channel of the channel attention layer is in one-to-one correspondence with each channel after convolution operation, and setting a softmax function in the channel attention layer, where the softmax function is connected to each channel of the channel attention layer, and is used to assign a weight to each channel of the channel attention layer. The weight value calculated by the softmax function is between 0 and 1, the sum of the weights of all channels is 1, and the calculation formula is as follows:

wherein i represents the ith channel, j represents the total number of channels, S _i For the channel weight of the ith channel, V represents an array.

Step s102, training the neural network model;

specifically, after the neural network model is built, parameters in the neural network model can be trained, a loss function of the neural network model is observed, a text classification model TextCNN is taken as an example, the loss function of the text classification model is cross entropy, and the neural network model is optimized through a gradient optimization algorithm of ADAM adaptive moment estimation. The training can be performed in multiple rounds, for example, a first round is to train 100 texts, a second round is to train 150 texts, a third round is to train 120 texts, the training amount of each round is as close as possible, the difference between the training amounts is not too large, and the parameter adjustment is convenient; when training 100 texts of the first round, the 100 texts can be divided into 10 parts, 10 documents can be divided into 10 parts, and the 100 texts can be divided into 20 parts, 5 texts can be respectively input into the neural network model for training, when training each text, parameters of the neural network model can be adjusted, when training 100 texts of the first round, after the training of the input neural network model, the training of the first round is finished, and at the moment, the adjusted parameters of the neural network and satisfactory loss function output can be obtained.

Step s103, pruning the channels in the neural network model according to the channel attention mechanism in the training process of the neural network model.

Specifically, after the neural network model is trained, the state of the neural network model, for example, the state of a loss function of the neural network model, is detected, if the convergence state of the cross entropy loss function of the neural network model is convergence, it is indicated that the training result of the round has tended to be stable, and then the deletion of the channel of the neural network model can be started.

Specifically, the pruning may be performed by determining a channel weight of each channel, and because the channel value with a smaller weight is relatively low, a number of channels with a channel weight value underlying the channel weight may be pruned, for example, when the channel weight of a channel in the neural network model is smaller than a preset value, the channel may be pruned.

In this embodiment, the channel attention layer is added before the full connection layer of the neural network model, the channel weight is calculated, and the channel with low weight is pruned, so as to implement compression of the neural network.

Fig. 2 is a schematic flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application, as shown in the fig. s101, a neural network model is constructed, and the channel attention mechanism is built in the neural network model, where the neural network model includes a plurality of channels, and includes:

step s201, constructing a neural network model, constructing a channel attention layer between a full connection layer and a convolution layer in the neural network model, and setting a softmax function in the channel attention layer, wherein each channel in the channel attention layer corresponds to each channel in the neural network model one by one;

specifically, the channel attention mechanism may be implemented in the neural network model by establishing a channel attention layer, the channel attention layer may be established between a fully connected layer and a convolution layer in the neural network model, the channel attention layer may include a plurality of channels, each channel of the channel attention layer may be in one-to-one correspondence with each channel in the neural network, that is, each channel of the channel attention layer is in one-to-one correspondence with each channel after convolution operation, and a softmax function is set in the channel attention layer, and the softmax function is connected with each channel of the channel attention layer and is used for assigning a weight to each channel of the channel attention layer.

Step s202, assigning a channel weight to each channel in the channel attention layer according to the softmax function.

Specifically, the output after convolution operation of the convolution layer in the neural network model is required to be calculated and allocated with the weight of each channel through a softmax function, and implicit characteristics, importance and relevance of each channel can be learned before the output passes through the softmax function. The channel weight value calculated by the softmax function is between 0 and 1, the sum of the weights of all channels is 1, and the calculation formula is as follows:

In this embodiment, by establishing an attention mechanism in the neural network model, the importance degree of each channel can be identified, and corresponding deletion can be performed.

Fig. 3 is a flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application, as shown in the drawing, the step s202 of allocating a channel weight to each channel in the channel attention layer according to the softmax function includes:

step s301, obtaining input information, and obtaining hidden layer output vector at current moment after the input information is subjected to convolution operation of a convolution layer in the neural network model and channel weight operation of the channel attention layer;

specifically, after input information is acquired in the neural network model, such as after a section of text or a pair of images, before entering the fully connected layer, processing of other layers is needed, such as convolution operation of a convolution layer, activation by an activation function, pooling operation of a pooling layer and channel weight operation of a channel attention layer, the input information is converted into a hidden layer output vector at the current moment, and the weight operation can be weighted summation.

Step s302, calculating the similarity between the hidden layer output vector and the input information at the previous moment, inputting the similarity into the softmax function, and performing normalization processing to obtain the channel weight of each channel in the channel attention layer.

Specifically, for the current moment, the hidden layer output vector of the previous moment is known, after the hidden layer output vector of the previous moment is obtained, the similarity between the hidden layer output vector of the previous moment and the input information, that is, the similarity between the hidden layer output vector of the previous moment and each channel corresponding to the input information, can be calculated first, the calculation method of the similarity can be performed through a cosine similarity or a dot product operation method, and after the similarity between the hidden layer output vector of the previous moment and each channel corresponding to the input information is calculated, the weight of each channel can be obtained through normalizing the similarity result through a softmax function.

In this embodiment, a channel weight is assigned to each channel of the neural network model through a softmax function, and the importance degree of each channel can be identified through the channel weight, and corresponding deletion is performed.

In one embodiment, the step s102, training the neural network model includes:

and (3) according to a formula for the neural network model:

Specifically, the neural network comprises a large number of units and connections, and the connection formula is as follows:

wherein Y is _i Representing the output of neuron i, function f represents the activation function, W _ij Representing the weight of the connection of neuron j to neuron i, θ represents the bias, X _j An input representing neuron j; and parameter W _ij And theta is obtained by training, taking a text classification model as an example, wherein the loss function of the text classification model is cross entropy, and the text classification model can be trained and optimized through adam algorithm, and the parameter W is updated _ij And theta, realizing the improvement of precision.

In this embodiment, parameters suitable for the model are obtained by training the neural network model, so as to prepare for channel deletion, and improve the efficiency of channel deletion.

Fig. 4 is a schematic flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application, as shown in the drawing, the step s102 of training the neural network model includes:

step s401, detecting a convergence state of a cross entropy loss function of the neural network model when training the neural network model;

specifically, after the neural network model is built, training parameters in the neural network model, observing a loss function of the neural network model, taking a text classification model TextCNN as an example, optimizing the neural network model through an adaptive moment-by-ADAM (adaptive matrix matching) gradient optimization algorithm, and observing a convergence state of the cross entropy of the neural network model after each training in the training process of the neural network model, wherein the training is performed for 100 texts, the 100 texts can be divided into 10 texts, and 10 texts can be observed when the training of the 100 texts is completed.

Step s402, when it is detected that the convergence state of the cross entropy loss function of the neural network model is convergence, starting deletion of channels in the neural network model.

Specifically, after training the neural network model, if it is detected that the convergence state of the cross entropy loss function of the neural network model is convergence, deleting the channels in the neural network model can be started, parameters of the neural network model are not adjusted any more, deleting the channels in the neural network model is only performed, and if it is detected that the convergence state of the cross entropy loss function of the neural network model is not convergence, training of the neural network model is continued.

In this embodiment, by detecting the cross entropy loss function of the neural network model and starting the channel pruning of the neural network model, accuracy of channel pruning can be improved, and loss of the model can be reduced.

Fig. 5 is a schematic flow chart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application, as shown in the drawing, in step s103, in a training process of the neural network model, channels in the neural network model are pruned according to the channel attention mechanism, including:

step s501, presetting a channel weight threshold of a channel;

specifically, a channel weight threshold value can be set for the channels of the neural network model in advance, and only a few channels with weights arranged at the tail are deleted in the training process, so that the model is not damaged too much, and therefore the weight threshold value can be set smaller.

Step s502, comparing the channel weight of each channel with the channel weight threshold value in the training process of the neural network model, and deleting the channels below the channel weight threshold value.

Specifically, after all channels of the neural network model are assigned with channel weights by the softmax function, the channel weight of each channel in the neural network model may be compared with the preset channel weight threshold, and channels below the channel weight threshold may be pruned.

In this embodiment, channels in the neural network model are pruned by presetting a channel weight threshold, so that the neural network can be effectively compressed.

Fig. 6 is a flowchart of a neural network compression method based on a channel attention mechanism according to an embodiment of the present application, as shown in the drawing, in step s103, after deleting a channel in the neural network model according to the channel attention mechanism in the training process of the neural network model, the method includes:

step s601, presetting the lowest channel number of the neural network model;

specifically, the number of channels may be determined when the neural network model is constructed, and after the number of channels of the neural network model is determined, the minimum number of channels may be preset, for example, the total number of channels of the neural network model is 128, and the minimum number of channels may be set to 120.

Step s602, when the channels in the neural network model are pruned, comparing the current channel number in the neural network model with the preset minimum channel number, and when the current channel number in the neural network model is not greater than the preset minimum channel number, stopping pruning.

Specifically, when the channel is pruned after the comparison of the channel weight and the channel weight threshold value, detecting the current total channel number in the neural network model, and if the total channel number in the neural network model is not greater than the preset minimum channel number, not pruned any more.

Specifically, the pruning may begin with the smallest channel, that is, a channel corresponding to the smallest channel weight is first found out in the current channel of the neural network model, and if the channel weight of the channel corresponding to the smallest channel weight is smaller than a preset channel weight threshold, then whether the current total channel number is smaller than the preset lowest channel number is continuously determined; if not, deleting the channel, then searching the channel corresponding to the minimum channel weight in the rest channels, and continuing to judge and delete; if so, the pruning of channels is stopped and the comparison of channel weights to channel weight thresholds is no longer performed.

In this embodiment, by presetting the lowest channel number, and not deleting when the current channel number reaches the lowest channel number, it is ensured that the loss of the model is not excessive.

A neural network compression device structure based on a channel attention mechanism in an embodiment of the present application is shown in fig. 7, and includes:

model building module 701, training module 702, and channel pruning module 703; wherein, the model building module 701 is connected with the training module 702, and the training module 702 and the channel deleting module 703 are connected; the model construction module 701 is configured to construct a neural network model, and to establish a channel attention mechanism in the neural network model, the neural network model comprising a plurality of channels; training module 702 is configured to train the neural network model; the channel pruning module 703 is configured to prune channels in the neural network model according to the channel attention mechanism during training of the neural network model.

The embodiment of the application also discloses a computer device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions are executed by one or more processors, so that the one or more processors execute the steps in the neural network compression method in the above embodiments.

The embodiments also disclose a storage medium readable and writable by a processor, where the memory stores computer readable instructions that when executed by one or more processors cause the one or more processors to perform the steps in the neural network compression method described in the embodiments above.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-only memory (ROM), or a random access memory (RandomAccessMemory, RAM).

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The neural network compression method based on the channel attention mechanism is characterized by comprising the following steps of:

training the neural network model;

deleting channels in the neural network model according to the channel attention mechanism in the training process of the neural network model;

the training of the neural network model includes: when the neural network model is trained, detecting the convergence state of the cross entropy of the neural network model, optimizing the neural network model through an ADAM adaptive moment estimated gradient optimization algorithm, and starting deletion of channels in the neural network model when the convergence state of the cross entropy of the neural network model is detected to be converged;

the training of the neural network model further comprises: and (3) according to a formula for the neural network model:training to obtain parameters W of the neural network model _ij And θ, where Y _i Representing the output of neuron i, function f represents the activation function, W _ij Representing the weight of the connection of neuron j to neuron i, θ represents the bias, X _j Representing the input of neuron j.

2. The channel attention mechanism based neural network compression method of claim 1, wherein said constructing a neural network model and establishing a channel attention mechanism in said neural network model, said neural network model comprising a plurality of channels, comprises:

3. The channel attention mechanism based neural network compression method of claim 2, wherein said assigning a channel weight to each channel in the channel attention layer according to the softmax function comprises:

4. The neural network compression method of claim 1, wherein the pruning of channels in the neural network model according to the channel attention mechanism during training of the neural network model comprises:

presetting a channel weight threshold of a channel;

the channel weight of each channel is compared with the channel weight threshold value in the training process of the neural network model, and channels lower than the channel weight threshold value are pruned.

5. The neural network compression method of claim 1, wherein after the channel in the neural network model is pruned according to the channel attention mechanism in the training process of the neural network model, the method comprises:

presetting the lowest channel number of the neural network model;

6. A neural network compression device based on a channel attention mechanism, the device comprising:

training module: the neural network model is set to train;

channel pruning module: the method comprises the steps of setting up to prune channels in the neural network model according to the channel attention mechanism in the training process of the neural network model;

the training module is specifically used for: the training of the neural network model includes: when the neural network model is trained, detecting the convergence state of the cross entropy of the neural network model, optimizing the neural network model through an ADAM adaptive moment estimated gradient optimization algorithm, and starting deletion of channels in the neural network model when the convergence state of the cross entropy of the neural network model is detected to be converged;

the training module is also specifically configured to: and (3) according to a formula for the neural network model:training to obtain parameters W of the neural network model _ij And θ, where Y _i Representing the output of neuron i, function f represents the activation function, W _ij Representing the weight of the connection of neuron j to neuron i, θ represents the bias, X _j Representing the input of neuron j.

7. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of the neural network compression method of any of claims 1 to 5.

8. A storage medium readable by a processor, the storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the neural network compression method of any one of claims 1 to 5.