CN114429553A

CN114429553A - Image recognition convolutional layer structure based on random calculation and sparse calculation

Info

Publication number: CN114429553A
Application number: CN202210093892.9A
Authority: CN
Inventors: 熊兴中; 董亚; 骆忠强
Original assignee: Sichuan University of Science and Engineering
Current assignee: Sichuan University of Science and Engineering
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-05-03

Abstract

The invention discloses an image recognition convolutional layer structure based on random calculation and sparse calculation, which relates to the technical field of deep learning, wherein sparse calculation is used for carrying out sparse processing on an input image characteristic diagram and a corresponding weight, nonzero values in sparse processing results are transmitted to a random calculation module to complete multiply-accumulate operation, and finally, an efficient parallel convolutional layer structure is built by utilizing sparse calculation and random calculation, is convolved and outputs convolution results; the image recognition convolutional layer structure based on random calculation and sparse calculation provided by the invention constructs a low-complexity convolutional layer structure based on random calculation and sparse calculation, guarantees the precision of convolutional operation, reduces the complexity of hardware realization, and fully utilizes the acceleration advantages brought by the input image characteristic diagram and weight sparseness.

Description

Image recognition convolutional layer structure based on random calculation and sparse calculation

Technical Field

The invention relates to the technical field of deep learning, in particular to an image recognition convolutional layer structure based on random calculation and sparse calculation.

Background

Convolutional Neural Networks (CNNs) are one of the most important models in deep learning, and are widely used in the fields of image recognition, speech recognition, computer vision, and the like. Generally, the more the number of layers of the neural network, the more parameters are needed, and the higher the accuracy obtained when forward reasoning is performed. However, the increase in the number of network layers and the number of parameters also means that more computing and memory resources are consumed. From a plurality of parameters of the observation convolutional neural network, partial parameters such as zero values are found, the influence on the final output result is small, and the partial parameters can be pruned during operation to reduce the storage space.

Because the convolutional neural network needs a large amount of storage space to store data, especially when calculation is performed, the development pace of the convolutional neural network is hindered by the huge calculation amount, and therefore, how to reduce the calculation amount of the convolutional neural network is a popular research idea in deep learning. The calculated amount in the convolutional neural network is mainly concentrated in a convolutional layer, the convolutional operation in the convolutional layer accounts for nine tenths of the whole calculated amount, the convolutional operation is in a direct proportion to the parameter capacity, and the redundant parameters can be subtracted through a pruning method. However, after pruning is directly performed, the complexity of neural network hardware implementation is increased due to the irregularity of operation in the network, so that the acceleration advantages brought by sparse input image feature maps and weights cannot be fully utilized. On the other hand, in the convolution operation process, a large amount of computing resource consumption and data storage are involved, so that the hardware consumption required for realizing the convolution neural network is huge. How to effectively design hardware for operation in the convolutional neural network and apply the hardware to an embedded device with limited computing resources and limited memory bandwidth is a problem worth discussion.

Based on the problems, the application provides an image recognition convolutional layer structure based on random calculation and sparse calculation, a low-complexity convolutional layer structure based on random calculation and sparse calculation is constructed, the convolutional calculation precision is guaranteed, meanwhile, the complexity of hardware implementation is reduced, and the acceleration advantages brought by input image feature graphs and weight sparseness are fully utilized.

Disclosure of Invention

The invention aims to provide an image recognition convolutional layer structure based on random calculation and sparse calculation, a low-complexity convolutional layer structure based on random calculation and sparse calculation is constructed, the convolution operation precision is guaranteed, meanwhile, the complexity of hardware implementation is reduced, and the acceleration advantages brought by input image feature graphs and weight sparseness are fully utilized.

The invention provides an image recognition convolutional layer structure based on random calculation and sparse calculation, which comprises the following steps:

a sparse processing module: performing AND logic processing on the input image feature map and the corresponding weight, outputting the input image feature map and the corresponding weight at the position when the AND logic result is 1, and outputting the result 0 when the AND logic result is 0;

a random calculation module: receiving an input image characteristic diagram and a corresponding weight value output by a sparse processing module, converting the input image characteristic diagram and the corresponding weight value from a binary domain to a probability domain, executing multiplication calculation of the input image characteristic diagram and the corresponding weight value in the probability domain, and adding a plurality of multiplication calculation results;

a convolution module: the method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result is output.

Further, the random computation module includes:

a forward conversion module: receiving an input image feature map and a corresponding weight value output by a sparse processing module, and converting the input image feature map and the corresponding weight value from a binary domain to a probability domain, wherein the conversion formula of the input image feature map is as follows:

wherein, P (x)_i) Is x_iAm (a)Expression, x_iValues representing the input image feature map, m being x_iBit width of b_miIs x_iThe most significant bit of;

the transfer function of the corresponding weight is:

wherein, w_iThe weight value is represented by a weight value,

represents performing a rounding operation;

a calculation module: in the probability domain, multiplication calculation of the input image feature map and the corresponding weight is performed through AND gate logic operation, and a plurality of multiplication calculation results are added by adopting an approximate parallel counter.

Further, the forward conversion module further includes:

inputting the value x of the image feature map_iIs unfolded into a sequence S only containing ' 0 ' and ' 1_iCopy x of j-th bit_iNumber 2^j-1Generating a sequence S_i＝{b_mi,…,b_mi,…,b_ji,…b_ji,…,b_1iIn which b is_jiIs 2 in number^j-1,j∈[1,m]To obtain a sequence S consisting of "0" and "1_iProbability of "1" in (1).

Further, the convolution module stores the input feature map data and the weight into a buffer area and performs convolution calculation.

Further, the convolution module adopts a ping-pong technique and simultaneously performs the next set of weight reading and the previous set of convolution calculation.

Compared with the prior art, the invention has the following remarkable advantages:

the invention provides an image identification convolutional layer structure based on random calculation and sparse calculation, wherein sparse calculation is used for carrying out sparse processing on an input image characteristic diagram and a corresponding weight, nonzero values in sparse processing results are transmitted to a random calculation module to complete multiplication and accumulation operation, and finally, an efficient parallel convolutional layer structure is built by utilizing sparse calculation and random calculation, convolution is carried out, and a convolution result is output; the image recognition convolutional layer structure based on random calculation and sparse calculation provided by the invention constructs a low-complexity convolutional layer structure based on random calculation and sparse calculation, guarantees the precision of convolutional operation, reduces the complexity of hardware realization, and fully utilizes the acceleration advantages brought by the input image characteristic diagram and weight sparseness.

Drawings

Fig. 1 is a block diagram of a data operation structure of an image recognition convolutional layer structure based on random computation and sparse computation according to an embodiment of the present invention;

fig. 2 is a structural diagram of a sparse processing module according to an embodiment of the present invention;

FIG. 3 is a block diagram of a random number computation module according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a convolution mode of the convolution module according to the embodiment of the present invention.

Detailed Description

The technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the drawings in the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

Example 1

Since the random calculation is a calculation method for performing data processing by replacing a binary value with a probability of 1 occurrence in a random bit stream composed of 0 and 1, a relatively complex operation function can be realized by a simple logic circuit. In order to apply random computation to convolution computation, three aspects need to be considered, namely a conversion process from binary values to random sequences, a random multiplication process and an accumulation process after multiplication is completed. The forward conversion module in the random calculation module corresponds to the conversion process, and the multiplication calculation module and the addition calculation module correspond to the latter two processes respectively. In the convolution calculation, because the input value and the weight value have positive numbers, negative numbers and 0, and the 0 is redundant data and has little influence on the overall convolution calculation result, the redundant data can be simply processed before multiplication and addition, and the calculation speed can be increased to a certain extent.

Referring to fig. 1-4, the present invention provides an image recognition convolutional layer structure based on random computation and sparse computation, comprising:

as shown in fig. 2, the sparseness processing module: and performing AND logic processing on the input image feature map and the corresponding weight, outputting the input image feature map and the corresponding weight at the position when the AND logic result is 1, outputting the result 0 when the AND logic result is 0, and after sparse processing, preventing all zero values from being transmitted to the random calculation module for calculation, thereby achieving two advantages, namely reducing the calculated amount and the corresponding data storage space required to be calculated and accelerating the calculation speed.

A random calculation module: the device comprises a forward conversion module and a calculation module. Note that zero values processed in the sparse processing module will not participate in the operations in the random computation module, which computes only non-zero values of the operations.

A forward conversion module: receiving an input image characteristic diagram and a corresponding weight value output by the sparse processing module, converting the input image characteristic diagram and the corresponding weight value from a binary domain to a probability domain, and converting a value x of the input image characteristic diagram_iIs unfolded into a sequence S only containing ' 0 ' and ' 1_iCopy x of j-th bit_iThe number of which is 2^j-1Generating a sequence S_i＝{b_mi,…,b_mi,…,b_ji,…b_ji,…,b_1iIn which b is_jiIs 2^j-1,j∈[1,m]To obtain a sequence S consisting of "0" and "1_iProbability of "1" in (1).

The conversion formula of the input image feature map is as follows:

wherein, P (x)_i) Is x_iProbability expression of (2), x_iValues representing the input image feature map, m being x_iBit width of b_miIs x_iThe most significant bit of;

the corresponding weight transfer function is:

wherein, w_iThe weight value is represented by a weight value,

indicates that the rounding operation is performed in the sequence S_iIn, there are

m is w_iIs determined. At this point, the input image feature diagram x is completed_iAnd a weight w_iFrom the binary domain to the probability domain.

A calculation module: in the probability domain, multiplication calculation of the input image feature map and the corresponding weight is performed through AND gate logic operation, and a plurality of multiplication calculation results are added by adopting an Approximate Parallel Counter (APC), the result is represented by binary number, and the number of '1' in the input is mainly calculated.

A convolution module: the method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result is output. The convolution module stores the input feature map data and the weights into a buffer area, and the convolution module adopts a ping-pong technology and simultaneously performs reading of the next group of weights and calculation of the previous group of convolutions.

As shown in FIG. 3, the random computation module is used to perform multiply-accumulate operations in the convolutional layer. And in each convolution, zero values are pruned to only calculate non-zero values, so that the storage space of intermediate data and the access times of the data during calculation are reduced to a certain extent, but the calculation delay is increased. In order to further improve the efficiency, a novel parallel convolution architecture is provided, and in space, convolution is carried out by multiplexing a plurality of convolution kernels by taking the number of channels as a reference. The input image feature maps in the same channel can be convolved by convolution kernels at the same positions of different channels, and the number of access times of the input image feature map data can be reduced to a certain extent by performing convolution according to the method, as shown in fig. 4.

For a two-dimensional convolution, a plurality of one-dimensional convolution kernels are used for calculation, the sizes of the convolution kernels are 3 × 3, 5 × 5 and 7 × 7, and the calculation complexity can be reduced remarkably by using smaller convolution kernels, so that the parallelism is set to be 3 when one two-dimensional convolution is carried out. For example, when calculating a two-dimensional convolution with a size of 3 × 3, the calculation process is to store 9 input values and corresponding weights in 3 groups into a buffer, and use ping-pong technique, i.e. read the next group of weights and perform the previous group of convolution calculations at the same time. In the convolution calculation process, each group of data executes sparse processing operation at one clock beat, after zero values are pruned, the rest non-zero value operation completes multiplication operation in parallel through a random module at one clock beat to obtain a plurality of parallel multiplication operation results, the operation results are still represented by a probability domain, then an approximate parallel counter is used for executing addition operation of a plurality of multiplication operation results to obtain three addition results, in order to obtain final results, the approximate parallel counter is used again for executing addition operation, and thus, the realization process of the 3 x 3-sized two-dimensional convolution is completed.

Assume three fixed point input data (3 bits wide) with a value of x₁＝+3，x₂＝0，x₃＝+5，x₄Each corresponding weight is w ═ 2₁＝+0.6，w₂＝-0.7，w₃＝0，w₄-0.8, the final calculation is x₁·w₁+x₂·w₂+x₃·w₃+x₄·w₄+ 3.4. As shown in the method of FIG. 1, the three input data and the corresponding weight data are serially inputted and are dilutedIn the sparse processing module, x is₁And w₁And x₄And w₄After the phase inversion, a result other than 0 is obtained, x₁And w₁And x₄And w₄Is transmitted to a random calculation module for calculation, and x₂And w₂And x₃And w₃And if the phase and result is 0, the output is 0, and the result is not transmitted into the random calculation module for calculation. In the random calculation module, a random sequence S is obtained through a forward conversion module₁(0000111) & S₄-1111110, scaling factor

The transfer functions of the weights are respectively f (w)₁) 0.6 and f (w)₄) 0.8, then, one can obtain

And

the random sequence corresponding to the weight is W₁(1111000) and W₄To {1111110}, the forward conversion process is completed. In the multiplication computation module, a random sequence S₁And W₁Performing a bitwise AND operation to obtain a random sequence Y₁Similarly, a random sequence Y can be obtained₄{1111110 }. In the addition calculation block, the two sequences are added to obtain a result of 2. The value has an error of 1.4 compared to the exact value of 3.4. But the times of multiplication and addition which need to be calculated are greatly reduced, the storage space of data is further saved, and the calculation speed is also improved.

The above disclosure is only for a few specific embodiments of the present invention, however, the present invention is not limited to the above embodiments, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present invention.

Claims

1. An image recognition convolutional layer structure based on random computation and sparse computation, comprising:

a convolution module: the image recognition method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result of image recognition is output.

2. The stochastic computation and sparse computation based image recognition convolutional layer structure of claim 1, wherein the stochastic computation module comprises:

the transfer function of the corresponding weight is:

wherein the content of the first and second substances,w_ithe weight value is represented by a weight value,

represents performing a rounding operation;

3. The random-computation-and-sparse-computation-based image recognition convolutional layer structure of claim 2, wherein said forward conversion module further comprises:

4. The convolutional layer structure for image recognition based on random computation and sparse computation of claim 1, wherein said convolutional module stores the inputted image feature map data and weight into the buffer and performs convolutional computation.

5. The random computation and sparse computation based image recognition convolutional layer structure of claim 4, wherein said convolutional module uses ping-pong technique to simultaneously perform the next set of weight reading and the previous set of convolutional computation.