CN114429553A - Image recognition convolutional layer structure based on random calculation and sparse calculation - Google Patents

Image recognition convolutional layer structure based on random calculation and sparse calculation Download PDF

Info

Publication number
CN114429553A
CN114429553A CN202210093892.9A CN202210093892A CN114429553A CN 114429553 A CN114429553 A CN 114429553A CN 202210093892 A CN202210093892 A CN 202210093892A CN 114429553 A CN114429553 A CN 114429553A
Authority
CN
China
Prior art keywords
calculation
sparse
random
computation
input image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210093892.9A
Other languages
Chinese (zh)
Inventor
熊兴中
董亚
骆忠强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University of Science and Engineering
Original Assignee
Sichuan University of Science and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University of Science and Engineering filed Critical Sichuan University of Science and Engineering
Priority to CN202210093892.9A priority Critical patent/CN114429553A/en
Publication of CN114429553A publication Critical patent/CN114429553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an image recognition convolutional layer structure based on random calculation and sparse calculation, which relates to the technical field of deep learning, wherein sparse calculation is used for carrying out sparse processing on an input image characteristic diagram and a corresponding weight, nonzero values in sparse processing results are transmitted to a random calculation module to complete multiply-accumulate operation, and finally, an efficient parallel convolutional layer structure is built by utilizing sparse calculation and random calculation, is convolved and outputs convolution results; the image recognition convolutional layer structure based on random calculation and sparse calculation provided by the invention constructs a low-complexity convolutional layer structure based on random calculation and sparse calculation, guarantees the precision of convolutional operation, reduces the complexity of hardware realization, and fully utilizes the acceleration advantages brought by the input image characteristic diagram and weight sparseness.

Description

Image recognition convolutional layer structure based on random calculation and sparse calculation
Technical Field
The invention relates to the technical field of deep learning, in particular to an image recognition convolutional layer structure based on random calculation and sparse calculation.
Background
Convolutional Neural Networks (CNNs) are one of the most important models in deep learning, and are widely used in the fields of image recognition, speech recognition, computer vision, and the like. Generally, the more the number of layers of the neural network, the more parameters are needed, and the higher the accuracy obtained when forward reasoning is performed. However, the increase in the number of network layers and the number of parameters also means that more computing and memory resources are consumed. From a plurality of parameters of the observation convolutional neural network, partial parameters such as zero values are found, the influence on the final output result is small, and the partial parameters can be pruned during operation to reduce the storage space.
Because the convolutional neural network needs a large amount of storage space to store data, especially when calculation is performed, the development pace of the convolutional neural network is hindered by the huge calculation amount, and therefore, how to reduce the calculation amount of the convolutional neural network is a popular research idea in deep learning. The calculated amount in the convolutional neural network is mainly concentrated in a convolutional layer, the convolutional operation in the convolutional layer accounts for nine tenths of the whole calculated amount, the convolutional operation is in a direct proportion to the parameter capacity, and the redundant parameters can be subtracted through a pruning method. However, after pruning is directly performed, the complexity of neural network hardware implementation is increased due to the irregularity of operation in the network, so that the acceleration advantages brought by sparse input image feature maps and weights cannot be fully utilized. On the other hand, in the convolution operation process, a large amount of computing resource consumption and data storage are involved, so that the hardware consumption required for realizing the convolution neural network is huge. How to effectively design hardware for operation in the convolutional neural network and apply the hardware to an embedded device with limited computing resources and limited memory bandwidth is a problem worth discussion.
Based on the problems, the application provides an image recognition convolutional layer structure based on random calculation and sparse calculation, a low-complexity convolutional layer structure based on random calculation and sparse calculation is constructed, the convolutional calculation precision is guaranteed, meanwhile, the complexity of hardware implementation is reduced, and the acceleration advantages brought by input image feature graphs and weight sparseness are fully utilized.
Disclosure of Invention
The invention aims to provide an image recognition convolutional layer structure based on random calculation and sparse calculation, a low-complexity convolutional layer structure based on random calculation and sparse calculation is constructed, the convolution operation precision is guaranteed, meanwhile, the complexity of hardware implementation is reduced, and the acceleration advantages brought by input image feature graphs and weight sparseness are fully utilized.
The invention provides an image recognition convolutional layer structure based on random calculation and sparse calculation, which comprises the following steps:
a sparse processing module: performing AND logic processing on the input image feature map and the corresponding weight, outputting the input image feature map and the corresponding weight at the position when the AND logic result is 1, and outputting the result 0 when the AND logic result is 0;
a random calculation module: receiving an input image characteristic diagram and a corresponding weight value output by a sparse processing module, converting the input image characteristic diagram and the corresponding weight value from a binary domain to a probability domain, executing multiplication calculation of the input image characteristic diagram and the corresponding weight value in the probability domain, and adding a plurality of multiplication calculation results;
a convolution module: the method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result is output.
Further, the random computation module includes:
a forward conversion module: receiving an input image feature map and a corresponding weight value output by a sparse processing module, and converting the input image feature map and the corresponding weight value from a binary domain to a probability domain, wherein the conversion formula of the input image feature map is as follows:
Figure BDA0003490129080000021
wherein, P (x)i) Is xiAm (a)Expression, xiValues representing the input image feature map, m being xiBit width of bmiIs xiThe most significant bit of;
the transfer function of the corresponding weight is:
Figure BDA0003490129080000031
wherein, wiThe weight value is represented by a weight value,
Figure BDA0003490129080000032
represents performing a rounding operation;
a calculation module: in the probability domain, multiplication calculation of the input image feature map and the corresponding weight is performed through AND gate logic operation, and a plurality of multiplication calculation results are added by adopting an approximate parallel counter.
Further, the forward conversion module further includes:
inputting the value x of the image feature mapiIs unfolded into a sequence S only containing ' 0 ' and ' 1iCopy x of j-th bitiNumber 2j-1Generating a sequence Si={bmi,…,bmi,…,bji,…bji,…,b1iIn which b isjiIs 2 in numberj-1,j∈[1,m]To obtain a sequence S consisting of "0" and "1iProbability of "1" in (1).
Further, the convolution module stores the input feature map data and the weight into a buffer area and performs convolution calculation.
Further, the convolution module adopts a ping-pong technique and simultaneously performs the next set of weight reading and the previous set of convolution calculation.
Compared with the prior art, the invention has the following remarkable advantages:
the invention provides an image identification convolutional layer structure based on random calculation and sparse calculation, wherein sparse calculation is used for carrying out sparse processing on an input image characteristic diagram and a corresponding weight, nonzero values in sparse processing results are transmitted to a random calculation module to complete multiplication and accumulation operation, and finally, an efficient parallel convolutional layer structure is built by utilizing sparse calculation and random calculation, convolution is carried out, and a convolution result is output; the image recognition convolutional layer structure based on random calculation and sparse calculation provided by the invention constructs a low-complexity convolutional layer structure based on random calculation and sparse calculation, guarantees the precision of convolutional operation, reduces the complexity of hardware realization, and fully utilizes the acceleration advantages brought by the input image characteristic diagram and weight sparseness.
Drawings
Fig. 1 is a block diagram of a data operation structure of an image recognition convolutional layer structure based on random computation and sparse computation according to an embodiment of the present invention;
fig. 2 is a structural diagram of a sparse processing module according to an embodiment of the present invention;
FIG. 3 is a block diagram of a random number computation module according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a convolution mode of the convolution module according to the embodiment of the present invention.
Detailed Description
The technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the drawings in the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Example 1
Since the random calculation is a calculation method for performing data processing by replacing a binary value with a probability of 1 occurrence in a random bit stream composed of 0 and 1, a relatively complex operation function can be realized by a simple logic circuit. In order to apply random computation to convolution computation, three aspects need to be considered, namely a conversion process from binary values to random sequences, a random multiplication process and an accumulation process after multiplication is completed. The forward conversion module in the random calculation module corresponds to the conversion process, and the multiplication calculation module and the addition calculation module correspond to the latter two processes respectively. In the convolution calculation, because the input value and the weight value have positive numbers, negative numbers and 0, and the 0 is redundant data and has little influence on the overall convolution calculation result, the redundant data can be simply processed before multiplication and addition, and the calculation speed can be increased to a certain extent.
Referring to fig. 1-4, the present invention provides an image recognition convolutional layer structure based on random computation and sparse computation, comprising:
as shown in fig. 2, the sparseness processing module: and performing AND logic processing on the input image feature map and the corresponding weight, outputting the input image feature map and the corresponding weight at the position when the AND logic result is 1, outputting the result 0 when the AND logic result is 0, and after sparse processing, preventing all zero values from being transmitted to the random calculation module for calculation, thereby achieving two advantages, namely reducing the calculated amount and the corresponding data storage space required to be calculated and accelerating the calculation speed.
A random calculation module: the device comprises a forward conversion module and a calculation module. Note that zero values processed in the sparse processing module will not participate in the operations in the random computation module, which computes only non-zero values of the operations.
A forward conversion module: receiving an input image characteristic diagram and a corresponding weight value output by the sparse processing module, converting the input image characteristic diagram and the corresponding weight value from a binary domain to a probability domain, and converting a value x of the input image characteristic diagramiIs unfolded into a sequence S only containing ' 0 ' and ' 1iCopy x of j-th bitiThe number of which is 2j-1Generating a sequence Si={bmi,…,bmi,…,bji,…bji,…,b1iIn which b isjiIs 2j-1,j∈[1,m]To obtain a sequence S consisting of "0" and "1iProbability of "1" in (1).
The conversion formula of the input image feature map is as follows:
Figure BDA0003490129080000051
wherein, P (x)i) Is xiProbability expression of (2), xiValues representing the input image feature map, m being xiBit width of bmiIs xiThe most significant bit of;
the corresponding weight transfer function is:
Figure BDA0003490129080000052
wherein, wiThe weight value is represented by a weight value,
Figure BDA0003490129080000053
indicates that the rounding operation is performed in the sequence SiIn, there are
Figure BDA0003490129080000054
m is wiIs determined. At this point, the input image feature diagram x is completediAnd a weight wiFrom the binary domain to the probability domain.
A calculation module: in the probability domain, multiplication calculation of the input image feature map and the corresponding weight is performed through AND gate logic operation, and a plurality of multiplication calculation results are added by adopting an Approximate Parallel Counter (APC), the result is represented by binary number, and the number of '1' in the input is mainly calculated.
A convolution module: the method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result is output. The convolution module stores the input feature map data and the weights into a buffer area, and the convolution module adopts a ping-pong technology and simultaneously performs reading of the next group of weights and calculation of the previous group of convolutions.
As shown in FIG. 3, the random computation module is used to perform multiply-accumulate operations in the convolutional layer. And in each convolution, zero values are pruned to only calculate non-zero values, so that the storage space of intermediate data and the access times of the data during calculation are reduced to a certain extent, but the calculation delay is increased. In order to further improve the efficiency, a novel parallel convolution architecture is provided, and in space, convolution is carried out by multiplexing a plurality of convolution kernels by taking the number of channels as a reference. The input image feature maps in the same channel can be convolved by convolution kernels at the same positions of different channels, and the number of access times of the input image feature map data can be reduced to a certain extent by performing convolution according to the method, as shown in fig. 4.
For a two-dimensional convolution, a plurality of one-dimensional convolution kernels are used for calculation, the sizes of the convolution kernels are 3 × 3, 5 × 5 and 7 × 7, and the calculation complexity can be reduced remarkably by using smaller convolution kernels, so that the parallelism is set to be 3 when one two-dimensional convolution is carried out. For example, when calculating a two-dimensional convolution with a size of 3 × 3, the calculation process is to store 9 input values and corresponding weights in 3 groups into a buffer, and use ping-pong technique, i.e. read the next group of weights and perform the previous group of convolution calculations at the same time. In the convolution calculation process, each group of data executes sparse processing operation at one clock beat, after zero values are pruned, the rest non-zero value operation completes multiplication operation in parallel through a random module at one clock beat to obtain a plurality of parallel multiplication operation results, the operation results are still represented by a probability domain, then an approximate parallel counter is used for executing addition operation of a plurality of multiplication operation results to obtain three addition results, in order to obtain final results, the approximate parallel counter is used again for executing addition operation, and thus, the realization process of the 3 x 3-sized two-dimensional convolution is completed.
Assume three fixed point input data (3 bits wide) with a value of x1=+3,x2=0,x3=+5,x4Each corresponding weight is w ═ 21=+0.6,w2=-0.7,w3=0,w4-0.8, the final calculation is x1·w1+x2·w2+x3·w3+x4·w4+ 3.4. As shown in the method of FIG. 1, the three input data and the corresponding weight data are serially inputted and are dilutedIn the sparse processing module, x is1And w1And x4And w4After the phase inversion, a result other than 0 is obtained, x1And w1And x4And w4Is transmitted to a random calculation module for calculation, and x2And w2And x3And w3And if the phase and result is 0, the output is 0, and the result is not transmitted into the random calculation module for calculation. In the random calculation module, a random sequence S is obtained through a forward conversion module1(0000111) & S4-1111110, scaling factor
Figure BDA0003490129080000071
The transfer functions of the weights are respectively f (w)1) 0.6 and f (w)4) 0.8, then, one can obtain
Figure BDA0003490129080000072
And
Figure BDA0003490129080000073
the random sequence corresponding to the weight is W1(1111000) and W4To {1111110}, the forward conversion process is completed. In the multiplication computation module, a random sequence S1And W1Performing a bitwise AND operation to obtain a random sequence Y1Similarly, a random sequence Y can be obtained4{1111110 }. In the addition calculation block, the two sequences are added to obtain a result of 2. The value has an error of 1.4 compared to the exact value of 3.4. But the times of multiplication and addition which need to be calculated are greatly reduced, the storage space of data is further saved, and the calculation speed is also improved.
The above disclosure is only for a few specific embodiments of the present invention, however, the present invention is not limited to the above embodiments, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present invention.

Claims (5)

1. An image recognition convolutional layer structure based on random computation and sparse computation, comprising:
a sparse processing module: performing AND logic processing on the input image feature map and the corresponding weight, outputting the input image feature map and the corresponding weight at the position when the AND logic result is 1, and outputting the result 0 when the AND logic result is 0;
a random calculation module: receiving an input image characteristic diagram and a corresponding weight value output by a sparse processing module, converting the input image characteristic diagram and the corresponding weight value from a binary domain to a probability domain, executing multiplication calculation of the input image characteristic diagram and the corresponding weight value in the probability domain, and adding a plurality of multiplication calculation results;
a convolution module: the image recognition method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result of image recognition is output.
2. The stochastic computation and sparse computation based image recognition convolutional layer structure of claim 1, wherein the stochastic computation module comprises:
a forward conversion module: receiving an input image feature map and a corresponding weight value output by a sparse processing module, and converting the input image feature map and the corresponding weight value from a binary domain to a probability domain, wherein the conversion formula of the input image feature map is as follows:
Figure FDA0003490129070000011
wherein, P (x)i) Is xiProbability expression of (2), xiValues representing the input image feature map, m being xiBit width of bmiIs xiThe most significant bit of;
the transfer function of the corresponding weight is:
Figure FDA0003490129070000012
wherein the content of the first and second substances,withe weight value is represented by a weight value,
Figure FDA0003490129070000013
Figure FDA0003490129070000014
represents performing a rounding operation;
a calculation module: in the probability domain, multiplication calculation of the input image feature map and the corresponding weight is performed through AND gate logic operation, and a plurality of multiplication calculation results are added by adopting an approximate parallel counter.
3. The random-computation-and-sparse-computation-based image recognition convolutional layer structure of claim 2, wherein said forward conversion module further comprises:
inputting the value x of the image feature mapiIs unfolded into a sequence S only containing ' 0 ' and ' 1iCopy x of j-th bitiNumber 2j-1Generating a sequence Si={bmi,…,bmi,…,bji,…bji,…,b1iIn which b isjiIs 2 in numberj-1,j∈[1,m]To obtain a sequence S consisting of "0" and "1iProbability of "1" in (1).
4. The convolutional layer structure for image recognition based on random computation and sparse computation of claim 1, wherein said convolutional module stores the inputted image feature map data and weight into the buffer and performs convolutional computation.
5. The random computation and sparse computation based image recognition convolutional layer structure of claim 4, wherein said convolutional module uses ping-pong technique to simultaneously perform the next set of weight reading and the previous set of convolutional computation.
CN202210093892.9A 2022-01-26 2022-01-26 Image recognition convolutional layer structure based on random calculation and sparse calculation Pending CN114429553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210093892.9A CN114429553A (en) 2022-01-26 2022-01-26 Image recognition convolutional layer structure based on random calculation and sparse calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210093892.9A CN114429553A (en) 2022-01-26 2022-01-26 Image recognition convolutional layer structure based on random calculation and sparse calculation

Publications (1)

Publication Number Publication Date
CN114429553A true CN114429553A (en) 2022-05-03

Family

ID=81312994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210093892.9A Pending CN114429553A (en) 2022-01-26 2022-01-26 Image recognition convolutional layer structure based on random calculation and sparse calculation

Country Status (1)

Country Link
CN (1) CN114429553A (en)

Similar Documents

Publication Publication Date Title
Ardakani et al. An architecture to accelerate convolution in deep neural networks
US10810484B2 (en) Hardware accelerator for compressed GRU on FPGA
US10698657B2 (en) Hardware accelerator for compressed RNN on FPGA
Yepez et al. Stride 2 1-D, 2-D, and 3-D Winograd for convolutional neural networks
US20180260710A1 (en) Calculating device and method for a sparsely connected artificial neural network
CN111414994B (en) FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN110543939B (en) Hardware acceleration realization device for convolutional neural network backward training based on FPGA
CN110991631A (en) Neural network acceleration system based on FPGA
Alawad et al. Stochastic-based deep convolutional networks with reconfigurable logic fabric
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
WO2022134465A1 (en) Sparse data processing method for accelerating operation of re-configurable processor, and device
CN110069444A (en) A kind of computing unit, array, module, hardware system and implementation method
CN115186802A (en) Block sparse method and device based on convolutional neural network and processing unit
CN209708122U (en) A kind of computing unit, array, module, hardware system
CN113313244B (en) Near-storage neural network accelerator for addition network and acceleration method thereof
Shu et al. High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination
CN112862091B (en) Resource multiplexing type neural network hardware accelerating circuit based on quick convolution
CN110716751B (en) High-parallelism computing platform, system and computing implementation method
US20230068941A1 (en) Quantized neural network training and inference
CN112836793B (en) Floating point separable convolution calculation accelerating device, system and image processing method
CN114429553A (en) Image recognition convolutional layer structure based on random calculation and sparse calculation
CN113392963B (en) FPGA-based CNN hardware acceleration system design method
CN115688892A (en) FPGA implementation method of sparse weight Fused-Layer convolution accelerator structure
CN113988279A (en) Output current reading method and system of storage array supporting negative value excitation
Özkilbaç et al. Real-Time Fixed-Point Hardware Accelerator of Convolutional Neural Network on FPGA Based

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination