CN114429553A - Image recognition convolutional layer structure based on random calculation and sparse calculation - Google Patents
Image recognition convolutional layer structure based on random calculation and sparse calculation Download PDFInfo
- Publication number
- CN114429553A CN114429553A CN202210093892.9A CN202210093892A CN114429553A CN 114429553 A CN114429553 A CN 114429553A CN 202210093892 A CN202210093892 A CN 202210093892A CN 114429553 A CN114429553 A CN 114429553A
- Authority
- CN
- China
- Prior art keywords
- calculation
- sparse
- random
- computation
- input image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses an image recognition convolutional layer structure based on random calculation and sparse calculation, which relates to the technical field of deep learning, wherein sparse calculation is used for carrying out sparse processing on an input image characteristic diagram and a corresponding weight, nonzero values in sparse processing results are transmitted to a random calculation module to complete multiply-accumulate operation, and finally, an efficient parallel convolutional layer structure is built by utilizing sparse calculation and random calculation, is convolved and outputs convolution results; the image recognition convolutional layer structure based on random calculation and sparse calculation provided by the invention constructs a low-complexity convolutional layer structure based on random calculation and sparse calculation, guarantees the precision of convolutional operation, reduces the complexity of hardware realization, and fully utilizes the acceleration advantages brought by the input image characteristic diagram and weight sparseness.
Description
Technical Field
The invention relates to the technical field of deep learning, in particular to an image recognition convolutional layer structure based on random calculation and sparse calculation.
Background
Convolutional Neural Networks (CNNs) are one of the most important models in deep learning, and are widely used in the fields of image recognition, speech recognition, computer vision, and the like. Generally, the more the number of layers of the neural network, the more parameters are needed, and the higher the accuracy obtained when forward reasoning is performed. However, the increase in the number of network layers and the number of parameters also means that more computing and memory resources are consumed. From a plurality of parameters of the observation convolutional neural network, partial parameters such as zero values are found, the influence on the final output result is small, and the partial parameters can be pruned during operation to reduce the storage space.
Because the convolutional neural network needs a large amount of storage space to store data, especially when calculation is performed, the development pace of the convolutional neural network is hindered by the huge calculation amount, and therefore, how to reduce the calculation amount of the convolutional neural network is a popular research idea in deep learning. The calculated amount in the convolutional neural network is mainly concentrated in a convolutional layer, the convolutional operation in the convolutional layer accounts for nine tenths of the whole calculated amount, the convolutional operation is in a direct proportion to the parameter capacity, and the redundant parameters can be subtracted through a pruning method. However, after pruning is directly performed, the complexity of neural network hardware implementation is increased due to the irregularity of operation in the network, so that the acceleration advantages brought by sparse input image feature maps and weights cannot be fully utilized. On the other hand, in the convolution operation process, a large amount of computing resource consumption and data storage are involved, so that the hardware consumption required for realizing the convolution neural network is huge. How to effectively design hardware for operation in the convolutional neural network and apply the hardware to an embedded device with limited computing resources and limited memory bandwidth is a problem worth discussion.
Based on the problems, the application provides an image recognition convolutional layer structure based on random calculation and sparse calculation, a low-complexity convolutional layer structure based on random calculation and sparse calculation is constructed, the convolutional calculation precision is guaranteed, meanwhile, the complexity of hardware implementation is reduced, and the acceleration advantages brought by input image feature graphs and weight sparseness are fully utilized.
Disclosure of Invention
The invention aims to provide an image recognition convolutional layer structure based on random calculation and sparse calculation, a low-complexity convolutional layer structure based on random calculation and sparse calculation is constructed, the convolution operation precision is guaranteed, meanwhile, the complexity of hardware implementation is reduced, and the acceleration advantages brought by input image feature graphs and weight sparseness are fully utilized.
The invention provides an image recognition convolutional layer structure based on random calculation and sparse calculation, which comprises the following steps:
a sparse processing module: performing AND logic processing on the input image feature map and the corresponding weight, outputting the input image feature map and the corresponding weight at the position when the AND logic result is 1, and outputting the result 0 when the AND logic result is 0;
a random calculation module: receiving an input image characteristic diagram and a corresponding weight value output by a sparse processing module, converting the input image characteristic diagram and the corresponding weight value from a binary domain to a probability domain, executing multiplication calculation of the input image characteristic diagram and the corresponding weight value in the probability domain, and adding a plurality of multiplication calculation results;
a convolution module: the method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result is output.
Further, the random computation module includes:
a forward conversion module: receiving an input image feature map and a corresponding weight value output by a sparse processing module, and converting the input image feature map and the corresponding weight value from a binary domain to a probability domain, wherein the conversion formula of the input image feature map is as follows:
wherein, P (x)i) Is xiAm (a)Expression, xiValues representing the input image feature map, m being xiBit width of bmiIs xiThe most significant bit of;
the transfer function of the corresponding weight is:
wherein, wiThe weight value is represented by a weight value,represents performing a rounding operation;
a calculation module: in the probability domain, multiplication calculation of the input image feature map and the corresponding weight is performed through AND gate logic operation, and a plurality of multiplication calculation results are added by adopting an approximate parallel counter.
Further, the forward conversion module further includes:
inputting the value x of the image feature mapiIs unfolded into a sequence S only containing ' 0 ' and ' 1iCopy x of j-th bitiNumber 2j-1Generating a sequence Si={bmi,…,bmi,…,bji,…bji,…,b1iIn which b isjiIs 2 in numberj-1,j∈[1,m]To obtain a sequence S consisting of "0" and "1iProbability of "1" in (1).
Further, the convolution module stores the input feature map data and the weight into a buffer area and performs convolution calculation.
Further, the convolution module adopts a ping-pong technique and simultaneously performs the next set of weight reading and the previous set of convolution calculation.
Compared with the prior art, the invention has the following remarkable advantages:
the invention provides an image identification convolutional layer structure based on random calculation and sparse calculation, wherein sparse calculation is used for carrying out sparse processing on an input image characteristic diagram and a corresponding weight, nonzero values in sparse processing results are transmitted to a random calculation module to complete multiplication and accumulation operation, and finally, an efficient parallel convolutional layer structure is built by utilizing sparse calculation and random calculation, convolution is carried out, and a convolution result is output; the image recognition convolutional layer structure based on random calculation and sparse calculation provided by the invention constructs a low-complexity convolutional layer structure based on random calculation and sparse calculation, guarantees the precision of convolutional operation, reduces the complexity of hardware realization, and fully utilizes the acceleration advantages brought by the input image characteristic diagram and weight sparseness.
Drawings
Fig. 1 is a block diagram of a data operation structure of an image recognition convolutional layer structure based on random computation and sparse computation according to an embodiment of the present invention;
fig. 2 is a structural diagram of a sparse processing module according to an embodiment of the present invention;
FIG. 3 is a block diagram of a random number computation module according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a convolution mode of the convolution module according to the embodiment of the present invention.
Detailed Description
The technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the drawings in the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Example 1
Since the random calculation is a calculation method for performing data processing by replacing a binary value with a probability of 1 occurrence in a random bit stream composed of 0 and 1, a relatively complex operation function can be realized by a simple logic circuit. In order to apply random computation to convolution computation, three aspects need to be considered, namely a conversion process from binary values to random sequences, a random multiplication process and an accumulation process after multiplication is completed. The forward conversion module in the random calculation module corresponds to the conversion process, and the multiplication calculation module and the addition calculation module correspond to the latter two processes respectively. In the convolution calculation, because the input value and the weight value have positive numbers, negative numbers and 0, and the 0 is redundant data and has little influence on the overall convolution calculation result, the redundant data can be simply processed before multiplication and addition, and the calculation speed can be increased to a certain extent.
Referring to fig. 1-4, the present invention provides an image recognition convolutional layer structure based on random computation and sparse computation, comprising:
as shown in fig. 2, the sparseness processing module: and performing AND logic processing on the input image feature map and the corresponding weight, outputting the input image feature map and the corresponding weight at the position when the AND logic result is 1, outputting the result 0 when the AND logic result is 0, and after sparse processing, preventing all zero values from being transmitted to the random calculation module for calculation, thereby achieving two advantages, namely reducing the calculated amount and the corresponding data storage space required to be calculated and accelerating the calculation speed.
A random calculation module: the device comprises a forward conversion module and a calculation module. Note that zero values processed in the sparse processing module will not participate in the operations in the random computation module, which computes only non-zero values of the operations.
A forward conversion module: receiving an input image characteristic diagram and a corresponding weight value output by the sparse processing module, converting the input image characteristic diagram and the corresponding weight value from a binary domain to a probability domain, and converting a value x of the input image characteristic diagramiIs unfolded into a sequence S only containing ' 0 ' and ' 1iCopy x of j-th bitiThe number of which is 2j-1Generating a sequence Si={bmi,…,bmi,…,bji,…bji,…,b1iIn which b isjiIs 2j-1,j∈[1,m]To obtain a sequence S consisting of "0" and "1iProbability of "1" in (1).
The conversion formula of the input image feature map is as follows:
wherein, P (x)i) Is xiProbability expression of (2), xiValues representing the input image feature map, m being xiBit width of bmiIs xiThe most significant bit of;
the corresponding weight transfer function is:
wherein, wiThe weight value is represented by a weight value,indicates that the rounding operation is performed in the sequence SiIn, there arem is wiIs determined. At this point, the input image feature diagram x is completediAnd a weight wiFrom the binary domain to the probability domain.
A calculation module: in the probability domain, multiplication calculation of the input image feature map and the corresponding weight is performed through AND gate logic operation, and a plurality of multiplication calculation results are added by adopting an Approximate Parallel Counter (APC), the result is represented by binary number, and the number of '1' in the input is mainly calculated.
A convolution module: the method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result is output. The convolution module stores the input feature map data and the weights into a buffer area, and the convolution module adopts a ping-pong technology and simultaneously performs reading of the next group of weights and calculation of the previous group of convolutions.
As shown in FIG. 3, the random computation module is used to perform multiply-accumulate operations in the convolutional layer. And in each convolution, zero values are pruned to only calculate non-zero values, so that the storage space of intermediate data and the access times of the data during calculation are reduced to a certain extent, but the calculation delay is increased. In order to further improve the efficiency, a novel parallel convolution architecture is provided, and in space, convolution is carried out by multiplexing a plurality of convolution kernels by taking the number of channels as a reference. The input image feature maps in the same channel can be convolved by convolution kernels at the same positions of different channels, and the number of access times of the input image feature map data can be reduced to a certain extent by performing convolution according to the method, as shown in fig. 4.
For a two-dimensional convolution, a plurality of one-dimensional convolution kernels are used for calculation, the sizes of the convolution kernels are 3 × 3, 5 × 5 and 7 × 7, and the calculation complexity can be reduced remarkably by using smaller convolution kernels, so that the parallelism is set to be 3 when one two-dimensional convolution is carried out. For example, when calculating a two-dimensional convolution with a size of 3 × 3, the calculation process is to store 9 input values and corresponding weights in 3 groups into a buffer, and use ping-pong technique, i.e. read the next group of weights and perform the previous group of convolution calculations at the same time. In the convolution calculation process, each group of data executes sparse processing operation at one clock beat, after zero values are pruned, the rest non-zero value operation completes multiplication operation in parallel through a random module at one clock beat to obtain a plurality of parallel multiplication operation results, the operation results are still represented by a probability domain, then an approximate parallel counter is used for executing addition operation of a plurality of multiplication operation results to obtain three addition results, in order to obtain final results, the approximate parallel counter is used again for executing addition operation, and thus, the realization process of the 3 x 3-sized two-dimensional convolution is completed.
Assume three fixed point input data (3 bits wide) with a value of x1=+3,x2=0,x3=+5,x4Each corresponding weight is w ═ 21=+0.6,w2=-0.7,w3=0,w4-0.8, the final calculation is x1·w1+x2·w2+x3·w3+x4·w4+ 3.4. As shown in the method of FIG. 1, the three input data and the corresponding weight data are serially inputted and are dilutedIn the sparse processing module, x is1And w1And x4And w4After the phase inversion, a result other than 0 is obtained, x1And w1And x4And w4Is transmitted to a random calculation module for calculation, and x2And w2And x3And w3And if the phase and result is 0, the output is 0, and the result is not transmitted into the random calculation module for calculation. In the random calculation module, a random sequence S is obtained through a forward conversion module1(0000111) & S4-1111110, scaling factorThe transfer functions of the weights are respectively f (w)1) 0.6 and f (w)4) 0.8, then, one can obtainAndthe random sequence corresponding to the weight is W1(1111000) and W4To {1111110}, the forward conversion process is completed. In the multiplication computation module, a random sequence S1And W1Performing a bitwise AND operation to obtain a random sequence Y1Similarly, a random sequence Y can be obtained4{1111110 }. In the addition calculation block, the two sequences are added to obtain a result of 2. The value has an error of 1.4 compared to the exact value of 3.4. But the times of multiplication and addition which need to be calculated are greatly reduced, the storage space of data is further saved, and the calculation speed is also improved.
The above disclosure is only for a few specific embodiments of the present invention, however, the present invention is not limited to the above embodiments, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present invention.
Claims (5)
1. An image recognition convolutional layer structure based on random computation and sparse computation, comprising:
a sparse processing module: performing AND logic processing on the input image feature map and the corresponding weight, outputting the input image feature map and the corresponding weight at the position when the AND logic result is 1, and outputting the result 0 when the AND logic result is 0;
a random calculation module: receiving an input image characteristic diagram and a corresponding weight value output by a sparse processing module, converting the input image characteristic diagram and the corresponding weight value from a binary domain to a probability domain, executing multiplication calculation of the input image characteristic diagram and the corresponding weight value in the probability domain, and adding a plurality of multiplication calculation results;
a convolution module: the image recognition method comprises a plurality of parallel convolution kernels, wherein the parallel convolution kernels finish convolution calculation through a sparse processing module and a random calculation module, and a convolution result of image recognition is output.
2. The stochastic computation and sparse computation based image recognition convolutional layer structure of claim 1, wherein the stochastic computation module comprises:
a forward conversion module: receiving an input image feature map and a corresponding weight value output by a sparse processing module, and converting the input image feature map and the corresponding weight value from a binary domain to a probability domain, wherein the conversion formula of the input image feature map is as follows:
wherein, P (x)i) Is xiProbability expression of (2), xiValues representing the input image feature map, m being xiBit width of bmiIs xiThe most significant bit of;
the transfer function of the corresponding weight is:
wherein the content of the first and second substances,withe weight value is represented by a weight value, represents performing a rounding operation;
a calculation module: in the probability domain, multiplication calculation of the input image feature map and the corresponding weight is performed through AND gate logic operation, and a plurality of multiplication calculation results are added by adopting an approximate parallel counter.
3. The random-computation-and-sparse-computation-based image recognition convolutional layer structure of claim 2, wherein said forward conversion module further comprises:
inputting the value x of the image feature mapiIs unfolded into a sequence S only containing ' 0 ' and ' 1iCopy x of j-th bitiNumber 2j-1Generating a sequence Si={bmi,…,bmi,…,bji,…bji,…,b1iIn which b isjiIs 2 in numberj-1,j∈[1,m]To obtain a sequence S consisting of "0" and "1iProbability of "1" in (1).
4. The convolutional layer structure for image recognition based on random computation and sparse computation of claim 1, wherein said convolutional module stores the inputted image feature map data and weight into the buffer and performs convolutional computation.
5. The random computation and sparse computation based image recognition convolutional layer structure of claim 4, wherein said convolutional module uses ping-pong technique to simultaneously perform the next set of weight reading and the previous set of convolutional computation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210093892.9A CN114429553A (en) | 2022-01-26 | 2022-01-26 | Image recognition convolutional layer structure based on random calculation and sparse calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210093892.9A CN114429553A (en) | 2022-01-26 | 2022-01-26 | Image recognition convolutional layer structure based on random calculation and sparse calculation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114429553A true CN114429553A (en) | 2022-05-03 |
Family
ID=81312994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210093892.9A Pending CN114429553A (en) | 2022-01-26 | 2022-01-26 | Image recognition convolutional layer structure based on random calculation and sparse calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114429553A (en) |
-
2022
- 2022-01-26 CN CN202210093892.9A patent/CN114429553A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ardakani et al. | An architecture to accelerate convolution in deep neural networks | |
US10810484B2 (en) | Hardware accelerator for compressed GRU on FPGA | |
US10698657B2 (en) | Hardware accelerator for compressed RNN on FPGA | |
Yepez et al. | Stride 2 1-D, 2-D, and 3-D Winograd for convolutional neural networks | |
US20180260710A1 (en) | Calculating device and method for a sparsely connected artificial neural network | |
CN111414994B (en) | FPGA-based Yolov3 network computing acceleration system and acceleration method thereof | |
CN110543939B (en) | Hardware acceleration realization device for convolutional neural network backward training based on FPGA | |
CN110991631A (en) | Neural network acceleration system based on FPGA | |
Alawad et al. | Stochastic-based deep convolutional networks with reconfigurable logic fabric | |
CN112257844B (en) | Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof | |
WO2022134465A1 (en) | Sparse data processing method for accelerating operation of re-configurable processor, and device | |
CN110069444A (en) | A kind of computing unit, array, module, hardware system and implementation method | |
CN115186802A (en) | Block sparse method and device based on convolutional neural network and processing unit | |
CN209708122U (en) | A kind of computing unit, array, module, hardware system | |
CN113313244B (en) | Near-storage neural network accelerator for addition network and acceleration method thereof | |
Shu et al. | High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination | |
CN112862091B (en) | Resource multiplexing type neural network hardware accelerating circuit based on quick convolution | |
CN110716751B (en) | High-parallelism computing platform, system and computing implementation method | |
US20230068941A1 (en) | Quantized neural network training and inference | |
CN112836793B (en) | Floating point separable convolution calculation accelerating device, system and image processing method | |
CN114429553A (en) | Image recognition convolutional layer structure based on random calculation and sparse calculation | |
CN113392963B (en) | FPGA-based CNN hardware acceleration system design method | |
CN115688892A (en) | FPGA implementation method of sparse weight Fused-Layer convolution accelerator structure | |
CN113988279A (en) | Output current reading method and system of storage array supporting negative value excitation | |
Özkilbaç et al. | Real-Time Fixed-Point Hardware Accelerator of Convolutional Neural Network on FPGA Based |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |