CN111428188A

CN111428188A - Convolution operation method and device

Info

Publication number: CN111428188A
Application number: CN202010239668.7A
Authority: CN
Inventors: 岳涛; 赵思杰; 胡雪梅
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2020-07-17

Abstract

The invention discloses a convolution operation method and a convolution operation device. The method comprises the following steps: expanding the cyclic displacement of the convolution kernel element by element or channel by channel, performing convolution operation on the cyclic displacement and the input characteristic graph respectively, accelerating the operation process by using fast Fourier transform, and adjusting the parameters of the convolution kernel by adopting a learning mode. The convolution operation device comprises a cyclic displacement unit, a forward reasoning unit, a data unit, a back propagation unit and an updating unit. The method and the device not only can greatly reduce the parameter quantity of the convolutional neural network, but also can reduce the calculated quantity of the convolutional neural network, thereby accelerating the training process and the reasoning process, and enabling the network to be conveniently deployed in embedded equipment, mobile equipment or other terminals.

Description

Convolution operation method and device

Technical Field

The invention relates to the field of convolutional neural networks, in particular to a method and a device for convolutional operation.

Background

In recent years, Convolutional Neural Networks (CNNs) have been rapidly developed and widely used in the fields of object recognition, image restoration, semantic segmentation, and the like. A convolutional neural network usually consists of an indefinite number of convolutional layers, pooling layers, normalization layers, activation layers, and other modules, whereas a conventional standard convolution may contain tens of thousands of parameters, which generate millions of floating-point operations in the process of calculating with input data, and thus require a large amount of storage space and calculation resources, and usually the convolutional operation occupies more than 90% of the operation amount of the whole convolutional neural network. Due to the limitations of computational power, memory space, and power consumption, these convolutional neural networks are difficult to directly apply to embedded devices or mobile devices.

MobileNet proposes a deep separable convolution, i.e., a point-by-point convolution that decomposes a standard convolution of a set of k × k into a set of k × k grouped convolution kernels 1 × 1. ShuffleNet proposes the use of grouped convolutions to replace the standard convolution and uses the channelshuffle technique to facilitate flow between multi-channel information. L egoNet proposes a robust convolution that uses a small number of convolution kernels to generate more feature layers in a building block-like manner.GhostNet proposes a simple linear mapping of a small number of feature layers to expand the number of feature layers.

Disclosure of Invention

The invention mainly aims at the problems of large parameter quantity, high calculated amount and high power consumption in the operation process of the conventional convolution neural network, provides a convolution operation device and method, and can effectively reduce the calculated amount and parameter quantity of the neural network.

The method adopts the technical scheme that:

a method of convolution operation comprising the steps of:

step 1, performing convolution calculation on an original convolution kernel and an input characteristic graph to generate first output data;

step 2, performing cyclic shift operation on the original convolution kernel to obtain an intermediate convolution kernel: sequentially unfolding an original convolution kernel from a low dimension to a high dimension into a form of a one-dimensional vector, then translating all data of the vector by t bits, supplementing the t bits of data translated out of the vector to an end point of the vector so as to obtain a new one-dimensional vector, and then recombining the vector from the low dimension to the high dimension to obtain an intermediate convolution kernel, wherein the dimension of the intermediate convolution kernel is the same as that of the original convolution kernel;

step 3, performing convolution calculation on the intermediate convolution kernel and the input characteristic graph again to generate second output data;

step 4, repeating the operations o-1 times from the step 2 to the step 3, wherein the translation amount t is different each time; splicing all the obtained o data including the first output data and the second output data along a channel dimension to obtain an output characteristic diagram;

step 5, calculating the gradient of the input feature map according to the gradient of the output feature map;

step 6, calculating the gradient of an original convolution kernel according to the gradient of the output characteristic diagram and the gradient of the input characteristic diagram;

and 7, updating the original convolution kernel and the learning rate by using the original convolution kernel gradient calculated in the step 6.

Further, in step 1, the convolution calculation specifically includes: and carrying out counterpoint multiplication and summation on the original convolution kernel data and data corresponding to a certain position of the input characteristic diagram to obtain output data with one element, calculating to obtain unit element output data of all positions of the input characteristic diagram, and finally forming single-channel first output data.

Further, in the steps 1 to 4, a fast fourier transform method is used to accelerate convolution calculation, and the specific steps are as follows: sequentially unfolding original convolution kernel data from a low dimension to a high dimension into a form of one-dimensional vectors, and mapping the one-dimensional vectors into one-dimensional vectors in a frequency domain by using Fourier transform; meanwhile, local data corresponding to a certain position of the input characteristic diagram are expanded into a one-dimensional vector from a low dimension to a high dimension, and the one-dimensional vector is mapped into a one-dimensional vector in a frequency domain by using fast Fourier transform; and then performing dot product operation on the one-dimensional vectors in the two frequency domains, mapping the result vector into a one-dimensional result vector in the time domain by using inverse fast Fourier transform, selecting o data from the one-dimensional result vector as data of each channel at a position corresponding to the output feature map data, wherein the selected basis is the interval represented by each translation amount in the step 4.

Further, in the step 5, firstly, the output feature map gradient is expanded by using "0" around to obtain an expanded output feature map gradient, then the original convolution kernel in the step 1 is subjected to a rotation operation to generate an original rotation convolution kernel, wherein the rotation operation means that data of a convolution kernel two-dimensional sub-matrix in each channel of the original convolution kernel are sequentially subjected to symmetrical transformation along two matrix diagonals. (ii) a Performing cyclic displacement operation o-1 times on the original rotary convolution kernel in the step 2, wherein each displacement corresponds to the displacement in the step 4, performing channel arrangement operation on an intermediate convolution kernel group formed by the generated o-1 convolution kernels and the original rotary convolution kernel to generate a final convolution kernel group comprising c convolution kernels, wherein the channel arrangement operation is to extract ith channel data of each convolution kernel in the intermediate convolution kernel group to form a new convolution kernel; and performing convolution operation on each convolution kernel in the final convolution kernel group and the gradient of the expansion output characteristic diagram to obtain c parts of partial input characteristic diagram gradients in total, and performing channel dimension splicing on the partial input characteristic diagram gradients to finally obtain the gradient of the input characteristic diagram.

Further, in the step 6, an intermediate feature map group is formed by o-1 intermediate feature maps obtained by performing cyclic shift operation o-1 times on the input feature map and the input feature map, the convolution operation in the step 1 is performed on each feature map in the intermediate feature map group and the gradient of the output feature map to obtain c partial convolution kernel gradients in total, and the c partial convolution kernel gradients are subjected to channel dimension splicing to obtain the original convolution kernel gradient finally.

Further, in step 7, the gradient of the original convolution kernel is multiplied by the negative number of the learning rate and accumulated to the original convolution kernel to obtain an updated original convolution kernel, and a constant is subtracted from the learning rate to obtain an updated learning rate.

The invention relates to a convolution operation device, which comprises a cyclic displacement unit, a forward reasoning unit, a data unit, a reverse propagation unit and an updating unit; the cyclic displacement unit is used for translating the convolution kernel and the one-dimensional vector data of the input characteristic diagram; the forward reasoning unit is used for carrying out convolution calculation on the convolution kernel and the input characteristic diagram; the data unit is used for storing an input feature map, a convolution kernel, an output feature map and gradient data; the back propagation unit is used for solving the input characteristic diagram gradient and the original convolution kernel gradient in a reverse mode according to the output characteristic diagram gradient; and the updating unit is used for updating the original convolution kernel and the learning rate.

The invention has the advantages that: compared with the prior convolution method in the convolution neural network, the method has the advantages that the single convolution kernel is subjected to the cyclic shift operation, and when the feature layers with the same channel number are generated, the required weight parameters are only the number

o denotes the number of output channels. Meanwhile, forward reasoning is mapped into a frequency domain for operation by means of fast Fourier transform, and under the condition of common parameter setting, the calculated amount is greatly reduced, so that the forward reasoning process of the neural network is accelerated, and the neural network can be conveniently deployed in embedded equipment, mobile equipment or other terminals.

Drawings

FIG. 1 is a training flowchart in embodiment 1 of the present invention;

FIG. 2 is a schematic flow diagram of the process of the present invention;

fig. 3 is a schematic diagram of the framework of the device of the invention.

Detailed Description

The invention provides a convolution operation method and a convolution operation device, which are applied to a neural network. The present invention will be described in detail with reference to the accompanying drawings.

Example 1

Fig. 1 is a training flowchart of the present embodiment, which includes the following steps:

(1) building a neural network and initializing all weight parameters;

(2) inputting a training sample pair;

(3) forward reasoning is carried out layer by layer, for a convolution network part, the method provided by the invention is used for carrying out forward reasoning, and for other general neural network modules, a given mode is used for carrying out forward reasoning;

(4) calculating the final output of the neural network and the sample label according to a loss function to obtain a loss value;

(5) the loss values are reversely propagated layer by layer through the loss function, the weight parameter gradient of each layer of network is solved, for the convolution network part, the method provided by the invention is used for solving the convolution kernel gradient, and for other general neural network modules, the given mode is used for reversely propagating;

(6) updating all weight parameters of the neural network according to the weight parameter gradient, including convolution kernels of the convolution network part and weight parameters of other general neural network modules, and updating the learning rate;

(7) and (5) repeating the steps (2) to (6) before the specified number of iterations is reached.

The operation of the above steps involving the convolutional network portion will be described in detail below with reference to the flow chart of the method of the present invention shown in fig. 2.

In this embodiment, the data related to a single layer convolution network includes an input feature map, an original convolution kernel, an output feature map, an input feature map gradient, an original convolution kernel gradient, and an output feature map gradient. In this embodiment, the data organization mode of the input feature map is

The data organization mode of the original convolution kernel is

The data organization mode of the output characteristic diagram is

Each having four dimensions, wherein,

representing the real number field, c representing the number of channels of the input data, h_xIndicating the height, w, of the input data_xRepresenting the width of the input data, o representing the number of channels output, k representing the size of the convolution kernel, h_yIndicating the height of the output data, w_yRepresenting the width of the output data, the way the data of their respective gradients are organized and is the same as itself.

The convolutional network part involved in step (3) is described in detail in this embodiment, and as shown in fig. 2, the process includes the following steps:

step 1, performing convolution calculation on an original convolution kernel F and an input feature map X to generate first output data. The convolution calculation refers to: and carrying out counterpoint multiplication and summation on data corresponding to a certain position of the first input data and the second input data to obtain output data with one element, calculating to obtain unit element output data of all positions of the second input data, and finally forming single-channel output data. Here, the first input data is the original convolution kernel F, and the second input data is the input feature map X.

And 2, performing cyclic shift operation on the original convolution kernel to obtain an intermediate convolution kernel. And (2) sequentially unfolding the original convolution kernel from a low dimension to a high dimension into a form of a one-dimensional vector, then translating all data of the vector by t bits, supplementing the t bits of data translated out of the vector to the end point of the vector to obtain a new one-dimensional vector, and recombining the vector from the low dimension to the high dimension to obtain an intermediate convolution kernel, wherein the dimension of the intermediate convolution kernel is the same as that of the original convolution kernel in the step (1). In this embodiment, the intermediate convolution kernel obtained by the above cyclic shift operation is denoted as

Indicating that the cyclic shift operation direction is left and the shift amount is t₁。

And 3, performing convolution calculation on the intermediate convolution kernel and the input characteristic diagram again to generate second output data.

And 4, repeating the operation o-1 times of the steps 2 to 3, wherein the translation amount t is different every time, and splicing all o data including the first output data and the second output data along the channel dimension to obtain an output characteristic diagram. The above steps can be expressed as:

in the formula, a represents a convolution operation. Translation amount t in the present embodiment_iIs determined in such a way that there is an arithmetic progression of the number o (0, k)²,2k²,…(o-1)k²) The ith number in (a) may be regarded as the ith number in the sequence of arithmetic differences (0,1,2, … (o-1)) with o numbers, where the cyclic left shift is performed along the third dimension while keeping the lowest two-dimension data of the original convolution kernel F unchanged, and accordingly, the inter-channel translation amount may be expressed as:

in the present embodiment, the step size of convolution is set to 1, and equation (1) can be expressed as the following mathematical equation:

in the formula, d, i, j is the index of the output Y from high dimension to low dimension, b, m, n is the weight parameter F_g←dFrom the high dimension to the low dimension, the formula describes the specific calculation flow of the above steps 1 to 4, and finally the output feature diagram Y is obtained.

Optionally, in this embodiment, a fast fourier transform is used to accelerate the operation process, and the specific operation process is as follows: equation (3) can first be expressed as a dot product between two vectors:

in the formula, the first step is that,

represents the calculation of Y_d,i,jThe data involved in the input data X,

represents a weight parameter F ofExpanding from high dimension to low dimension to obtain a one-dimensional vector and circularly and leftwards translating the vector by k²The bit of the d is a bit of the D,

representing the dot product of two vectors. Will Y_d,i,jThe vector formed by o data of different channels is represented as y_i,jThe vector is a part of the vector t obtained by circularly convolving x and f, and can be expressed as:

in the formula, the first step is that,

which represents a convolution of the circumference of a circle,

k-th representing the vector t²c-1 numerical value. The fourier transform of the two vector circular convolutions is equal to the dot product of their respective fourier transforms, thus:

in the formula, the first step is that,

which represents the inverse fast fourier transform of the signal,

representing a fast fourier transform. As shown in formula (5), o points are taken from the one-dimensional vector t as partial data of o channels of the output feature map, and the above operation is repeated according to step length traversal input feature map to obtain complete o channel data, which is the output feature map Y. Accelerating the steps 1 to 4 by using a formula (6), and calculating the number of floating point number operations required by the output data Y from h_y×w_y×k²c × o to h_y×w_y×k²c×3log₂k²c. In thatUnder a common parameter setting, for example, when k is 3 and c is o is 64, the calculation amount is reduced to 43% compared to that in steps 1 to 4 without using an acceleration method, and the larger the number o of channels of the output data Y, the more significant the acceleration effect can be obtained.

The convolutional network part involved in step (5) is described in detail in this embodiment, and as shown in fig. 1, the process includes the following steps:

and 5, calculating the gradient of the input feature map according to the gradient of the output feature map. The specific operation flow is as follows: firstly, expanding the periphery of the output characteristic graph gradient by using 0 to obtain an expanded output characteristic graph gradient, and then carrying out rotation operation on the original convolution kernel in the step 1 to generate an original rotation convolution kernel, wherein the rotation operation refers to that data of a convolution kernel two-dimensional sub-matrix in each channel of the original convolution kernel are sequentially and symmetrically transformed along two matrix diagonals. And (3) performing cyclic displacement operation o-1 times on the original convolution kernel in the step (2), wherein each displacement corresponds to the step (4), performing channel arrangement operation on an intermediate convolution kernel group formed by the generated o-1 convolution kernels and the original convolution kernel to generate a final convolution kernel group comprising c convolution kernels, wherein the channel arrangement operation is to extract ith channel data of each convolution kernel in the intermediate convolution kernel group to form a new convolution kernel, performing convolution operation on each convolution kernel in the final convolution kernel group and the expansion output characteristic map gradient to obtain c partial input characteristic map gradients in total, performing channel dimension splicing on the partial input characteristic map gradients, and finally obtaining the input characteristic map gradient. With reference to the present embodiment, the equivalent calculation formula of the above operation flow can be expressed as:

in the formula, d ', i ', j ' represents the index of the input feature diagram X from the high dimension to the low dimension, and L represents the final loss value of the neural network.

And 6, calculating the gradient of the original convolution kernel according to the gradient of the output characteristic diagram. The specific operation flow is as follows: and (3) performing cyclic displacement operation o-1 times on the input feature map to obtain o-1 intermediate feature maps, forming an intermediate feature map group with the input feature map, performing convolution operation in the step (1) on each feature map in the intermediate feature map group and the gradient of the output feature map to obtain c parts of partial convolution kernel gradient data in total, and performing channel dimension splicing on the c parts of partial convolution kernel gradient data to finally obtain the original convolution kernel gradient. In this embodiment, the equivalent calculation formula of the above operation flow can be expressed as:

in the formula, X_g→dIndicating that the data of the two dimensions with the lowest input data X is kept unchanged, and circularly performing right shift by d bits along the third dimension.

The convolutional network part involved in step (6) is described in detail in this embodiment, and as shown in fig. 1, the process includes the following steps:

and 7, updating the original convolution kernel and the learning rate by using the original convolution kernel gradient calculated in the step 6. The method comprises the following specific steps: and multiplying the original convolution kernel gradient and the negative number of the learning rate and accumulating the result to the original convolution kernel to obtain an updated original convolution kernel, and simultaneously subtracting a constant from the learning rate to obtain an updated learning rate. In this embodiment, the equivalent calculation formula of the above operation flow can be expressed as:

in the formula, F ', η' represents the updated weight parameter and learning rate, and α is a constant.

Example 2

The embodiment provides a convolution operation device, named convolution operation device 100, as shown in fig. 3, which includes a cyclic shift unit, a forward inference unit, a data unit, a back propagation unit, an update unit, and an acceleration unit.

The cyclic displacement unit is used for translating the original convolution kernel, the original rotation convolution kernel and the one-dimensional vector data of the input characteristic diagram; the forward reasoning unit is used for carrying out convolution calculation on the original convolution kernel, the intermediate convolution kernel and the input feature map;

the data unit is used for storing an input feature map, an original convolution kernel, an output feature map, an input feature map gradient, an original convolution kernel gradient and an output feature map gradient;

the back propagation unit is used for solving the input characteristic diagram gradient and the original convolution kernel gradient in a reverse mode according to the output characteristic diagram gradient;

and the updating unit is used for updating the original convolution kernel and the learning rate.

And the accelerating unit accelerates the calculation process in the cyclic displacement unit and the forward reasoning unit by using a fast Fourier transform method.

For the data unit, in this embodiment, the data organization mode of the input feature map is

The data organization mode of the original convolution kernel is

The data organization mode of the output characteristic diagram is

Each having four dimensions, wherein,

representing the real number field, c representing the number of channels of the input data, h_xIndicating the height, w, of the input data_xRepresenting the width of the input data, o representing the number of channels output, k representing the size of the convolution kernel, h_yIndicating the height of the output data, w_yRepresenting the width of the output data, the way the data of their respective gradients are organized and is the same as itself. Compared with the conventional standard convolutional network layer in deep learning, the device has the advantages that the weight parameters of the convolutional kernels in the data unit of the device are only the weight parameters in the conventional standard convolutional network layer under the condition that the data quantity of the input feature map and the output feature map is the same

o denotes the number of lanes of output data.

And for the forward reasoning unit, the method is characterized in that the original convolution kernel F is expanded by element or channel by channel cyclic displacement operation, and is subjected to convolution operation with the input feature diagram X.

For the accelerating unit, the method is characterized in that the forward reasoning process is mapped to a frequency domain for operation by using fast Fourier transform, and then the result is mapped back to a time domain by using inverse fast Fourier transform.

The counter-propagating unit comprises two parts: a first sub-processing unit and a second sub-processing unit. The first sub-processing unit outputs characteristic diagram gradient

Solving for input feature map gradients

Providing a basis for gradient solution of a previous layer of network by using a chain rule; the second sub-processing unit outputs characteristic diagram gradient

Solving original convolution kernel gradients

Providing basis for updating the original convolution kernel F.

For the update unit, the effect is to gradient the original convolution kernel according to the set learning rate η

Adds up to the original convolution kernel F to update it and updates the learning rate.

The convolution operation device 100 of the embodiment is constructed into a single-layer neural network, the convolution operation device 100 is used for replacing a conventional standard convolution layer in the convolution neural network, and the neural network is constructed with other general neural network structures, wherein the general neural network structures mainly comprise an activation layer, a pooling layer, a normalization layer and a loss function. And training the built neural network by using the training data until reaching the specified iteration times, and then deploying the trained neural network model to embedded equipment, mobile equipment or other terminals.

Claims

1. A method of convolution operation, comprising the steps of:

2. The method of claim 1, wherein in step 1, the convolution calculation specifically comprises the following steps: and carrying out counterpoint multiplication and summation on the original convolution kernel data and data corresponding to a certain position of the input characteristic diagram to obtain output data with one element, calculating to obtain unit element output data of all positions of the input characteristic diagram, and finally forming single-channel first output data.

3. The method of claim 1, wherein in the steps 1 to 4, a fast fourier transform method is used to accelerate convolution calculation, and the specific steps are as follows: sequentially unfolding original convolution kernel data from a low dimension to a high dimension into a form of one-dimensional vectors, and mapping the one-dimensional vectors into one-dimensional vectors in a frequency domain by using Fourier transform; meanwhile, local data corresponding to a certain position of the input characteristic diagram are expanded into a one-dimensional vector from a low dimension to a high dimension, and the one-dimensional vector is mapped into a one-dimensional vector in a frequency domain by using fast Fourier transform; and then performing dot product operation on the one-dimensional vectors in the two frequency domains, mapping the result vector into a one-dimensional result vector in the time domain by using inverse fast Fourier transform, selecting o data from the one-dimensional result vector as data of each channel at a position corresponding to the output feature map data, wherein the selected basis is the interval represented by each translation amount in the step 4.

4. The method of claim 1, wherein in step 5, the gradient of the output feature map is first expanded by "0" around to obtain an expanded gradient of the output feature map, and then the original convolution kernel in step 1 is rotated to generate an original rotated convolution kernel; performing cyclic displacement operation on the original rotary convolution kernel for o-1 times in the step 2, wherein each displacement corresponds to the displacement in the step 4, and performing channel arrangement operation on an intermediate convolution kernel group formed by the generated o-1 convolution kernels and the original rotary convolution kernel to generate a final convolution kernel group comprising c convolution kernels; and performing convolution operation on each convolution kernel in the final convolution kernel group and the gradient of the expansion output characteristic diagram to obtain c parts of partial input characteristic diagram gradients in total, and performing channel dimension splicing on the partial input characteristic diagram gradients to finally obtain the gradient of the input characteristic diagram.

5. The method of convolution operation according to claim 1, wherein in step 6, an intermediate feature map group is formed by o-1 intermediate feature maps obtained by performing cyclic shift operation o-1 times on the input feature map and the input feature map, a total of c partial convolution kernel gradients are obtained by performing convolution operation in step 1 on each feature map in the intermediate feature map group and the output feature map gradient, and the c partial convolution kernel gradients are subjected to channel dimension stitching to obtain an original convolution kernel gradient.

6. The method of claim 1, wherein in step 7, the original convolution kernel gradient is multiplied by the negative of the learning rate and added to the original convolution kernel to obtain an updated original convolution kernel, and the learning rate is subtracted by a constant to obtain an updated learning rate.

7. A convolution operation device is characterized by comprising a cyclic shift unit, a forward reasoning unit, a data unit, a back propagation unit and an updating unit;

the cyclic displacement unit is used for translating the convolution kernel and the one-dimensional vector data of the input characteristic diagram;

the forward reasoning unit is used for carrying out convolution calculation on the convolution kernel and the input characteristic diagram;

the data unit is used for storing an input feature map, a convolution kernel, an output feature map and gradient data;

8. The apparatus of claim 7, further comprising an acceleration unit for accelerating the processes in the cyclic shift unit and the forward inference unit by using fast fourier transform.