CN106228240B - Deep convolution neural network implementation method based on FPGA - Google Patents
Deep convolution neural network implementation method based on FPGA Download PDFInfo
- Publication number
- CN106228240B CN106228240B CN201610615714.2A CN201610615714A CN106228240B CN 106228240 B CN106228240 B CN 106228240B CN 201610615714 A CN201610615714 A CN 201610615714A CN 106228240 B CN106228240 B CN 106228240B
- Authority
- CN
- China
- Prior art keywords
- convolution
- calculation
- matrix
- fpga
- floating point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims abstract description 110
- 238000005070 sampling Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000003062 neural network model Methods 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 63
- 238000013527 convolutional neural network Methods 0.000 claims description 28
- 238000011176 pooling Methods 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 239000000872 buffer Substances 0.000 claims description 7
- 230000003213 activating effect Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 125000004122 cyclic group Chemical group 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 28
- 238000012545 processing Methods 0.000 abstract description 8
- 239000013598 vector Substances 0.000 abstract description 7
- 238000003909 pattern recognition Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 14
- 238000013135 deep learning Methods 0.000 description 7
- 230000001133 acceleration Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000013001 matrix buffer Substances 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Neurology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Complex Calculations (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of digital image processing and pattern recognition. In particular to a deep convolution neural network implementation method based on FPGA. The hardware platform realized by the invention is a XilinxZYNQ-7030 programmable SoC on chip, and an FPGA and an ARM Cortex A9 processor are arranged in the hardware platform. Firstly, loading trained network model parameters to an FPGA end, then preprocessing input data at an ARM end, transmitting a result to the FPGA end, realizing convolution calculation and down-sampling of a deep convolution neural network at the FPGA end, forming data feature vectors and transmitting the data feature vectors to the ARM end to finish feature classification calculation. The invention realizes the convolution calculation part with the highest complexity in the deep convolution neural network model by utilizing the rapid parallel processing and the high-efficiency calculation characteristic of extremely low power consumption of the FPGA, greatly improves the algorithm efficiency and reduces the power consumption on the premise of ensuring the algorithm accuracy.
Description
Technical Field
The invention belongs to the technical field of digital image processing and pattern recognition, and particularly relates to a method for realizing a deep convolutional neural network model on an FPGA hardware platform.
Background
Under the condition of high-speed development of the current computer technology and the internet, the data scale is increased explosively, and intelligent analysis and processing of mass data become the key points for effectively utilizing the data value. The artificial intelligence technology is an effective means for finding valuable information from mass data, and breakthrough progress is made in the application fields of computer vision, speech recognition, natural language processing and the like in recent years. One representative example is a deep learning algorithm model based on a deep convolutional neural network.
Convolutional Neural Networks (CNN) were inspired by neuroscience research. After the evolution of more than 20 years, remarkable theoretical research and practical application achievements are obtained in the fields of pattern recognition, man-machine confrontation and the like, and in a famous man-machine go confrontation game, an artificial intelligent system AlphaGo based on a CNN + Monte Carlo search tree algorithm overcomes world go champion plum stone with the advantage of 4:1 score. A typical CNN algorithm model consists of two parts: a feature extractor and a classifier. The feature extractor is responsible for generating low-dimensional feature vectors of input data and has good robustness on the data. The vector is classified as input data of a classifier (usually based on a traditional artificial neural network), and a classification result of the input data is obtained.
In the implementation of the convolutional neural network algorithm model, the convolution calculation accounts for 90% of the calculation amount of the whole algorithm model[1]Therefore, efficient calculation of the convolutional layer is the key for greatly improving the calculation efficiency of the CNN algorithm model, and realization of convolutional calculation through hardware acceleration is an effective way.
At present, a GPU cluster is generally used in the industry to realize a deep learning algorithm model, a deep neural network model is realized through large-scale parallel computing, and remarkable high-efficiency and high-performance results are obtained, however, the large-scale application of the GPU is restricted by the high power consumption of the GPU, and the deep learning algorithm model becomes a bottleneck of the practical popularization and application of the deep convolutional neural network algorithm model. The FPGA has the advantages of high-performance parallel computation and ultra-low power consumption, and the realization of a deep learning algorithm model on the FPGA is a necessary development direction in the field.
At present, there are three main schemes for implementing CNN by using FPGA:
(1) a soft core CPU is used for realizing a control part and is matched with an FPGA to realize algorithm acceleration;
(2) a hard core ARM Cortex A9 CPU embedded in a hard core SoC is used for realizing a control part and is matched with an FPGA (field programmable gate array) to realize algorithm acceleration;
(3) and the algorithm acceleration is realized by matching the cloud server with the FPGA.
The three schemes have advantages and disadvantages, and different acceleration schemes can be selected according to different application occasions.
In the deep convolutional neural network, convolutional layer calculation occupies more than 90% of calculated amount, and is a key link in the whole network model after the beginning, and the calculation efficiency directly influences the performance of the realization of the model algorithm. However, it is difficult to implement convolution calculation on FPGA, which mainly includes the following aspects:
(1) the deep learning algorithm model is basically still in the academic research stage at present, and large-scale industrial application also has a lot of algorithms and model optimization works, so the algorithm model needs to be continuously optimized to adapt to different application scenes, and deep learning theory and algorithm need to be deeply understood;
(2) the research and development of the FPGA are based on a bottom hardware language, the method is suitable for the condition that an algorithm model is relatively stable, and the continuously-changed deep learning algorithm model brings great difficulty for the realization of the algorithm model on the FPGA;
(3) implementing deep convolutional neural networks on FPGAs requires a great deal of experience in the engineering implementation of FPGAs. The running clock frequency of the FPGA and the output delay (Latency) of the module such as the multiplier used are contradictory to each other, and the higher the clock frequency is, the longer the output delay of the module is, and the lower the clock frequency is, the shorter the output delay of the module is. The parameters of the relative balance need to be found by manual experimentation with engineering experience.
Disclosure of Invention
The invention aims to provide a method for realizing a deep convolutional neural network model with high efficiency and low power consumption, so as to solve the problems of high power consumption and low efficiency of the current deep learning model based on a GPU or a CPU.
The invention optimizes the FPGA hardware design, effectively reduces the resource consumption and can realize the deep convolution neural network model on a low-end FPGA hardware platform.
The method for realizing the deep convolutional neural network model provided by the invention realizes that a hardware platform is a XilinxZYNQ-7030 programmable on-chip SoC, and an FPGA and an ARM Cortex A9 processor are arranged in the hardware platform. Firstly, loading trained network model parameters to an FPGA end, then preprocessing input data at an ARM end, transmitting a result to the FPGA end, realizing convolution calculation and down-sampling of a deep convolution neural network at the FPGA end, forming data feature vectors and transmitting the data feature vectors to the ARM end to finish feature classification calculation. The method specifically comprises the following 4 processes: model parameter loading process, input data preprocessing operation process, convolution and downsampling calculation process and classification calculation process:
1. the model parameter loading process comprises the following steps:
(1) training a deep convolutional neural network model offline;
(2) loading training model parameters at the ARM end;
(3) transmitting the model parameters to the FPGA;
2. the input data preprocessing operation process comprises the following steps:
(1) normalization processing;
(2) transmitting the processing result to the FPGA;
(3) storing the data to a Block RAM at an FPGA end;
3. the convolution and downsampling calculation process is as follows:
(1) initializing a convolution assembly line;
(2) performing convolution calculation;
(3) performing pooling downsampling calculation;
(4) reinitializing the convolution assembly line, and performing multilayer convolution downsampling calculation;
4. the classification calculation process comprises the following steps:
(1) transmitting the feature vector back to the ARM end;
(2) calculating through a classification model;
(3) and outputting a classification result.
The specific introduction is as follows:
(1) Loading parameters of a deep convolutional neural network model trained offline at an ARM end;
(2) transmitting the parameters of the training model to an FPGA end;
(3) the FPGA end is cached by FIFO and then stored in a Block RAM (random Access memory);
(1) Normalizing the input data to meet the requirement of model convolution operation;
(2) transmitting the ARM end normalized data to the FPGA end by using an APB bus;
(3) the FPGA end stores the normalized data into a Block RAM after FIFO cache;
And designing a deep pipeline implementation mode aiming at the computation of the convolution layer with the maximum computation amount in the deep convolutional neural network model. The network model is provided with H convolutional layers and pooling layers. The H (H =1,2, …, H) th convolutional layer is input as T m × m floating point number (32-bit) matrices, output as S (m-n +1) × (m-n +1) floating point number (32-bit) matrices, convolution kernel is K n × n floating point number (32-bit) matrices (n is less than or equal to m), input data sliding window scale is n × n, horizontal sliding step is 1, and vertical sliding step is 1.
(1) Initializing a convolution operation pipeline
Defining n +1 data cache registers P0,P1,…,Pn-1,PnEach register holds m data. Wherein n registers (P)(i-1)%(n+1)+0,P(i-1)%(n+1)+1,…,P(i-1)%(n+1)+n-1) Storing the data of the ith (i =1,2, …, T) sub-matrix (n × m) of the T (T =1,2, …, m-n +1) input data matrix, wherein% represents the remainder if (i-1))%(n+1)+x>n, then (i-1)% (n +1) + x =0, (i-1)% (n +1) + x +1=1, …, wherein x =0,1, …, n-1. If n is<m,P(i-1)%(n+1)+nThe register stores the (i + n) th row of data in the input data matrix, and parallel initialization is realized in the convolution calculation process, so that the idle period of the FPGA is reduced, and the calculation efficiency is improved.
Defining 1 convolution kernel matrix buffer register W, and storing the kth (K =1,2, …, K) n × n convolution kernel matrix weight data.
(2) H convolutional layer calculation
And completing convolution calculation of the t input data matrix and the k convolution kernel of the h convolution layer of the network, and activating the calculation result through a Sigmoid function.
Specifically, while performing convolution calculation each time, the i + n-th data buffer register P is initialized(i-1)%(n+1)+nAnd the data is used as the buffer input data of the (i +1) th sub-matrix convolution calculation in the convolution to realize the circular convolution.
Constructing a Sigmoid function at an FPGA end through a Floating-point IP (IP) core to realize the activation of a convolution calculation result; the expression of the Sigmoid function is:. The method comprises the following specific steps:
as described above, the input data is an m × m floating point matrix, the convolution kernel is an n × n floating point matrix, the sliding window scale is n × n, the horizontal sliding step is 1, and the vertical sliding step is 1, then the convolution result is an (m-n +1) x (m-n +1) floating point matrix, offset b11 (offline training model parameter) is added to each element of the matrix, and after activation by using a Sigmoid function, the result is an (m-n +1) x (m-n +1) floating point matrix, which is stored in the Block RAM.
And after 1 convolution calculation, re-initializing the convolution kernel matrix cache register W, performing the next convolution calculation and the reciprocating circular convolution calculation, wherein the calculation result is S (m-n +1) x (m-n +1) floating point number matrixes, and storing the floating point number matrixes into the Block RAM.
(3) H pooling level calculation
And realizing pooling calculation of the h convolution layer calculation result, wherein the result is S [ (m-n +1)/2] × [ (m-n +1)/2] floating point number matrixes, and storing the S floating point number matrixes into a Block RAM. The method comprises the following specific steps: and (3) setting the scale of a data sliding window of the convolution calculation result to be 2 multiplied by 2 and the step length to be 2, and realizing pooling by adopting an average down-sampling method, namely adding 2 multiplied by 2 floating-point number matrixes one by one, and averaging the calculation result to obtain S [ (m-n +1)/2] × [ (m-n +1)/2] floating-point number matrixes as an input matrix for the h +1 th convolution layer calculation.
And transmitting the convolution calculation and pooling calculation results back to the ARM end for classification operation. The method comprises the following specific steps: the FPGA end transmits a convolution pooling calculation result matrix in the Block RAM to the ARM end through FIFO cache and an APB bus, and the ARM end completes data classification calculation by utilizing Softmax operation to obtain and output a classification result of input data.
The method of the invention has the following main characteristics:
(1) a deep convolutional neural network model is realized on a low-end FPGA;
(2) the convolution calculation in the deep convolution neural network model is accelerated by utilizing a pipeline calculation mode;
(3) the control chip is realized by adopting an Soc embedded ARM processor, has the characteristics of small volume, low power consumption and high efficiency, and can be widely applied to the field of embedded systems.
The invention realizes the convolution calculation part with the highest complexity in the deep convolution neural network model by utilizing the rapid parallel processing and the high-efficiency calculation characteristic of extremely low power consumption of the FPGA, and greatly improves the algorithm efficiency on the premise of ensuring the algorithm accuracy. Compared with the traditional method for realizing the deep convolutional neural network based on the CPU or the GPU, the method disclosed by the invention has the advantages that the algorithm calculation speed is effectively improved, meanwhile, the power consumption is greatly reduced, and the problems of long operation time or large power consumption caused by adopting the CPU or the GPU to realize the deep convolutional neural network are effectively solved.
Drawings
FIG. 1 is a flow diagram of an FPGA-based deep convolutional neural network implementation.
Fig. 2MNIST database (section).
Fig. 3 is a schematic diagram of matrix transposition.
FIG. 4 is a schematic diagram of a pipeline computation.
FIG. 5 is a schematic diagram of convolution calculations.
FIG. 6 is a diagram of a deep convolutional neural network architecture.
Fig. 7 is a schematic view of the downsampling calculation.
FIG. 8 shows simulation results of a deep convolutional neural network model based on FPGA.
Fig. 9 measured classification results (MNIST database) of number "7".
Detailed Description
The following explains the concrete implementation of the handwritten character recognition algorithm by utilizing a deep convolutional neural network model on an FPGA hardware platform by using the method of the invention in combination with the attached drawings. (the deep convolutional neural network model consists of an input layer I, a first convolutional layer C1, a first downsampled layer S1, a second convolutional layer C2, a second downsampled layer S2, and a full-link layer Softmax. the input picture size is 28 × 28, the first convolutional layer contains 1 convolutional kernel of size 5 × 5, and the second convolutional layer contains 3 convolutional kernels of size 5 × 5).
The specific operation steps implemented on the FPGA by using the handwritten character recognition algorithm of the deep convolutional neural network model are shown in fig. 1.
1. Loading trained model parameters
Firstly, referring to a CNN function in a deep Learn Toolbox-master, and carrying out certain modification (rewriting a convolution function, changing the number of layers of a neural network into 5 layers, one input layer, two convolution layers and two down-sampling layers, wherein the first convolution layer has 1 convolution kernel with the size of 5 multiplied by 5, the second convolution layer has 3 convolution kernels with the size of 5 multiplied by 5, the sliding step length of the two down-sampling layers is 2, the sliding window has 2 multiplied by 2 matrixes, and the training times are set as 10), training a deep convolution neural network by using Matlab, then loading trained weight parameters and offset parameters at an ARM end, finally transmitting the trained model parameters to an FPGA end, caching the model parameters through an FIFO, and storing the model parameters in a Block RAM.
2. Pretreatment of
The MNIST handwriting image shown in FIG. 2 is read into memory, normalized by dividing each pixel by 255, and transposed as shown in FIG. 3.
3. Transmitting the pre-processing result to the FPGA
And transmitting the preprocessing result to an FPGA end through an APB bus on ZYNQ-7030 Soc, and storing the preprocessing result in a Block RAM after FIFO cache.
4. Initializing a convolution operation pipeline
As shown in FIG. 4, 6 data cache registers P are defined0,P1,P2,P3,P4,P5Each register may hold 28 floating-point data. Of which 5 registers (P)(i-1)%(5+1)+0,P(i-1)%(5+1)+1,…,P(i-1)%(5+1)+5-1) Data of the ith (i =1,2, …, 24) submatrix (5 × 28) of the input image matrix is stored, where% represents the remainder of the drawing if (i-1)% (5+1) + x>5, (i-1)% (5+1) + x =0, (i-1)% (5+1) + x +1=1, …, wherein x =0,1, …, 4. P(i-1)%(5+1)+5The register stores the (i + 5) th line of data in the input image matrix.
Defining 1 convolution kernel matrix buffer register W, storing 1 convolution layer 1, 5 x 5 convolution kernel matrix weight data.
5. Performing the 1 st convolution layer calculation
And completing convolution calculation of the 1 st convolution layer input image matrix of the network and the 1 st convolution kernel of the 1 st convolution layer, and activating a calculation result through a Sigmoid function.
Initializing the (i + 5) th data buffer register P while performing convolution calculation(i-1)%(5+1)+5And the circular convolution is realized as the buffer input data of the convolution calculation of the (i +1) th sub-matrix in the convolution, as shown in the figure 5.
And constructing a Sigmoid function at the FPGA end through a Floating-point IP (IP) core to realize the activation of the convolution calculation result. The Sigmoid function is expressed as:。
the method comprises the following specific steps:
as described above, the input image is a 28 × 28 floating point matrix, the convolution kernel is a 5 × 5 floating point matrix, the sliding window scale is 5 × 5, the horizontal sliding step is 1, and the vertical sliding step is 1, so that the convolution result is a 24 × 24 floating point matrix, each element of the matrix is added with an offset b11 (offline training model parameter), and after activation by using a Sigmoid function, the result is a 24 × 24 floating point matrix, and the floating point matrix is stored in the Block RAM.
After 1 convolution calculation, the calculation result is 1 matrix of 24 × 24 floating point numbers, and the matrix is stored in the Block RAM.
6. Perform the 1 st pooling level calculation
The pooling calculation of the 1 st convolution layer calculation result is realized, as shown in fig. 6, the result is 1 12 × 12 floating-point number matrix, and is stored in the Block RAM. The method comprises the following specific steps: the scale of the convolution calculation result data sliding window is 2 × 2, the step size is 2, and the pooling is realized by adopting an average down-sampling method, that is, 2 × 2 floating-point number matrixes are added one by one, the calculation result is averaged to obtain 1 12 × 12 floating-point number matrix which is used as the input matrix of the 2 nd convolution layer calculation, as shown in fig. 7.
7. Reinitializing a convolution pipeline
As shown in FIG. 4, 6 data cache registers P are reinitialized0,P1,P2,P3,P4,P5Each register holds 12 floating-point data. Of which 5 registers (P)(i-1)%(5+1)+0,P(i-1)%(5+1)+1,…,P(i-1)%(5+1)+5-1) Data of the ith (i =1,2, …, 8) submatrix (5 × 12) of the input matrix is stored, wherein% represents the remainder of the drawing, if (i-1)% (5+1) + x>5, (i-1)% (5+1) + x =0, (i-1)% (5+1) + x +1=1, …, wherein x =0,1, …, 4. P(i-1)%(5+1)+5The register stores the (i + 5) th row of data in the input matrix.
And reinitializing the convolution kernel matrix cache register W to store the 1 st 5 x 5 convolution kernel matrix weight data of the 2 nd convolution layer.
8. Performing 2 nd convolution layer calculation
And completing convolution calculation of the 2 nd convolution layer input data matrix of the network and the 1 st convolution kernel of the 2 nd convolution layer, and activating a calculation result through a Sigmoid function.
And reinitializing the convolution kernel matrix cache register W, storing the 2 nd 5 multiplied by 5 convolution kernel matrix weight data of the 2 nd convolution layer, completing the convolution calculation of the 2 nd convolution layer input data matrix and the 2 nd convolution kernel of the 2 nd convolution layer of the network, and activating the calculation result through a Sigmoid function.
And reinitializing a convolution kernel matrix cache register W, storing the 3 rd 5 multiplied by 5 convolution kernel matrix weight data of the 2 nd convolution layer, completing the convolution calculation of the 2 nd convolution layer input data matrix and the 2 nd convolution layer 3 rd convolution kernel of the network, and activating the calculation result through a Sigmoid function.
Initializing the (i + 5) th data buffer register P while performing each convolution calculation(i-1)%(5+1)+5And the circular convolution is realized as the buffer input data of the convolution calculation of the (i +1) th sub-matrix in the convolution, as shown in the figure 5.
The method comprises the following specific steps: as described above, the input image is a 12 × 12 floating point matrix, the convolution kernel is 3 5 × 5 floating point matrices, the sliding window scale is 5 × 5, the horizontal sliding step is 1, and the vertical sliding step is 1, the convolution result is 3 floating point matrices of 8 × 8, each element of the 3 matrices is added with offsets b21, b22, and b23 (offline training model parameters), and after activation by using a Sigmoid function, the result is 3 floating point matrices of 8 × 8, and the floating point matrices are stored in the Block RAM.
After 2 times of convolution calculation, the calculation result is 3 matrixes of 8 multiplied by 8 floating point numbers, and the matrixes are stored in a Block RAM.
9. Perform the 2 nd pooling level calculation
The pooling of the 2 nd convolutional layer calculation result is achieved, as shown in fig. 6, the result is 3 matrices of 4 × 4 floating-point numbers, and is stored in the Block RAM. The method comprises the following specific steps: the scale of the convolution calculation result data sliding window is 2 × 2, the step size is 2, the pooling is realized by adopting an average down-sampling method, that is, 2 × 2 floating point number matrixes are added one by one, the calculation result is averaged, and 3 4 × 4 floating point number matrixes are obtained and used as the input matrix of the Softmax layer, as shown in fig. 7.
10. Classification calculation
And transmitting the convolution calculation and pooling calculation results back to the ARM end for classification operation. The method comprises the following specific steps: the FPGA end transmits a convolution pooling calculation result matrix in the Block RAM to the ARM end through FIFO cache and APB bus, and the ARM end completes data classification calculation by utilizing Softmax operation to obtain and output a classification result of an input picture.
The simulation result of the method for processing the digital picture '7' in the MNIST database is shown in FIG. 8.
The measured classification results of the digital picture "7" in the MNIST database processed by the above method are shown in FIG. 9.
Reference to the literature
[1]Cong J, Xiao B. Minimizing Computation in Convolutional NeuralNetworks[M]// Artificial Neural Networks and Machine Learning – ICANN 2014.Springer International Publishing, 2014:33-7.
[2]Farabet C, Poulet C, Han J Y, et al. CNP: An FPGA-based processorfor Convolutional Networks[J]. International Conference on Field ProgrammableLogic&Applications, 2009:32-37.
[3]Gokhale V, Jin J, Dundar A, et al. A 240 G-ops/s MobileCoprocessor for Deep Neural Networks[C]// IEEE Embedded Vision Workshop.2014:696-701.
[4]Zhang C, Li P, Sun G, et al. Optimizing FPGA-based AcceleratorDesign for Deep Convolutional Neural Networks[C]// Acm/sigda InternationalSymposium. 2015:161-170.
[5]Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classificationwith Deep Convolutional Neural Networks[J]. Advances in Neural InformationProcessing Systems, 2012, 25(2):2012.
[6]Farabet C, Martini B, Corda B, et al. NeuFlow: A runtimereconfigurable dataflow processor for vision[J]. 2011, 9(6):109-116.
[7]Matai J, Irturk A, Kastner R. Design and Implementation of anFPGA-Based Real-Time Face Recognition System[C]// IEEE, InternationalSymposium on Field-Programmable Custom Computing Machines. 2011:97-100.
[8]Sankaradas M, Jakkula V, Cadambi S, et al. A Massively ParallelCoprocessor for Convolutional Neural Networks[C]// IEEE InternationalConference on Application-Specific Systems, Architectures and Processors.IEEE Computer Society, 2009:53-60.。
Claims (1)
1. A deep convolution neural network implementation method based on FPGA is characterized by comprising the following specific steps:
step 1, loading training model parameters
(1) Loading parameters of a deep convolutional neural network model trained offline at an ARM end;
(2) transmitting the parameters of the training model to an FPGA end;
(3) the FPGA end is cached by FIFO and then stored in the block random access memory;
step 2, preprocessing a deep convolution neural network model
(1) Normalizing the input data to meet the requirement of model convolution operation;
(2) transmitting the ARM end normalized data to the FPGA end by using an APB bus;
(3) the FPGA end stores the normalized data into a block random access memory after FIFO cache;
step 3, convolution and down-sampling calculation
Setting a network model to have H convolutional layers and H pooling layers, wherein the input of the H convolutional layer is a T m multiplied by m 32-bit floating point number matrix, and H is 1,2, … and H; the output is S (m-n +1) x (m-n +1) 32-bit floating point number matrixes, the convolution kernel is K n x n 32-bit floating point number matrixes, n is less than or equal to m, the input data sliding window scale is n x n, the transverse sliding step length is 1, and the longitudinal sliding step length is 1;
(1) initializing a convolution operation pipeline
Defining n +1 data cache registers P0,P1,…,Pn-1,PnEach register storing m data, n registers P(i-1)%(n+1)+0,P(i-1)%(n+1)+1,…,P(i-1)%(n+1)+n-1The ith sub-matrix n × m data of the tth input data matrix is stored, T is 1,2, …, TI ═ 1,2, …, m-n + 1; the remainder is expressed in% if (i-1)% (n +1) + x>n, then (i-1)% (n +1) + x ═ 0, (i-1)% (n +1) + x +1 ═ 1, …, where x ═ 0,1, …, n-1; if n is<m,P(i-1)%(n+1)+nThe register stores the (i + n) th row of data in the input data matrix, and parallel initialization is realized in the convolution calculation process, so that the idle period of the FPGA is reduced, and the calculation efficiency is improved;
defining 1 convolution kernel matrix cache register W, storing the kth n × n convolution kernel matrix weight data, wherein K is 1,2, … and K;
(2) h convolutional layer calculation
Completing convolution calculation of the t input data matrix and the k convolution kernel of the h convolution layer of the network, and activating a calculation result through a Sigmoid function;
initializing the data buffer register P while performing each convolution calculation(i-1)%(n+1)+nThe data is used as cache input data for the convolution calculation of the (i +1) th sub-matrix in the convolution to realize the cyclic convolution;
a Sigmoid function is constructed at the FPGA end through a floating point IP core, so that the activation of a convolution calculation result is realized, and the expression of the Sigmoid function is as follows:the method comprises the following specific steps:
as described above, the input data is an m × m floating point number matrix, the convolution kernel is an n × n floating point number matrix, the scale of the sliding window is n × n, the transverse sliding step is 1, and the longitudinal sliding step is 1, so that the convolution result is an (m-n +1) x (m-n +1) floating point number matrix, each element of the matrix is added with an offset b11, that is, an offline training model parameter, and after being activated by using a Sigmoid function, the result is an (m-n +1) x (m-n +1) floating point number matrix, and the floating point number matrix is stored in a Block RAM;
after 1 convolution calculation, reinitializing a convolution kernel matrix cache register W, performing the next convolution calculation and reciprocating circular convolution calculation, wherein the calculation result is S (m-n +1) x (m-n +1) floating point number matrixes, and storing the floating point number matrixes into a Block RAM;
(3) h pooling level calculation
Realizing pooling calculation of the h convolution layer calculation result, wherein the result is S [ (m-n +1)/2] × [ (m-n +1)/2] floating point number matrixes, and storing the S floating point number matrixes into a Block RAM; the method comprises the following specific steps: setting the scale of a convolution calculation result data sliding window to be 2 multiplied by 2 and the step length to be 2, realizing pooling by adopting an average down-sampling method, namely adding 2 multiplied by 2 floating point number matrixes one by one, and taking the average value of the calculation result to obtain S [ (m-n +1)/2] x [ (m-n +1)/2] floating point number matrixes as an input matrix for h +1 th convolution layer calculation;
step 4, classified calculation
Transmitting the convolution calculation and pooling calculation results back to the ARM end for classification operation; the method comprises the following specific steps: the FPGA end transmits a convolution pooling calculation result matrix in the Block RAM to the ARM end through FIFO cache and an APB bus, and the ARM end completes data classification calculation by utilizing Softmax operation to obtain and output a classification result of input data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610615714.2A CN106228240B (en) | 2016-07-30 | 2016-07-30 | Deep convolution neural network implementation method based on FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610615714.2A CN106228240B (en) | 2016-07-30 | 2016-07-30 | Deep convolution neural network implementation method based on FPGA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106228240A CN106228240A (en) | 2016-12-14 |
CN106228240B true CN106228240B (en) | 2020-09-01 |
Family
ID=57536621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610615714.2A Active CN106228240B (en) | 2016-07-30 | 2016-07-30 | Deep convolution neural network implementation method based on FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106228240B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11836608B2 (en) | 2020-06-23 | 2023-12-05 | Stmicroelectronics S.R.L. | Convolution acceleration with embedded vector decompression |
US11880759B2 (en) | 2020-02-18 | 2024-01-23 | Stmicroelectronics S.R.L. | Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks |
Families Citing this family (100)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018060268A (en) * | 2016-10-03 | 2018-04-12 | 株式会社日立製作所 | Recognition device and learning system |
KR20180073314A (en) * | 2016-12-22 | 2018-07-02 | 삼성전자주식회사 | Convolutional neural network system and operation method thererof |
CN106650691A (en) * | 2016-12-30 | 2017-05-10 | 北京旷视科技有限公司 | Image processing method and image processing device |
CN106529517B (en) * | 2016-12-30 | 2019-11-01 | 北京旷视科技有限公司 | Image processing method and image processing equipment |
US20180189229A1 (en) | 2017-01-04 | 2018-07-05 | Stmicroelectronics S.R.L. | Deep convolutional network heterogeneous architecture |
CN108269224B (en) | 2017-01-04 | 2022-04-01 | 意法半导体股份有限公司 | Reconfigurable interconnect |
CN106909970B (en) * | 2017-01-12 | 2020-04-21 | 南京风兴科技有限公司 | Approximate calculation-based binary weight convolution neural network hardware accelerator calculation device |
CN106875011B (en) * | 2017-01-12 | 2020-04-17 | 南京风兴科技有限公司 | Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof |
CN106682702A (en) * | 2017-01-12 | 2017-05-17 | 张亮 | Deep learning method and system |
CN108304922B (en) * | 2017-01-13 | 2020-12-15 | 华为技术有限公司 | Computing device and computing method for neural network computing |
WO2018137177A1 (en) * | 2017-01-25 | 2018-08-02 | 北京大学 | Method for convolution operation based on nor flash array |
CN106779060B (en) * | 2017-02-09 | 2019-03-08 | 武汉魅瞳科技有限公司 | A kind of calculation method for the depth convolutional neural networks realized suitable for hardware design |
CN106875012B (en) * | 2017-02-09 | 2019-09-20 | 武汉魅瞳科技有限公司 | A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA |
TWI607389B (en) * | 2017-02-10 | 2017-12-01 | 耐能股份有限公司 | Pooling operation device and method for convolutional neural network |
CN106991474B (en) * | 2017-03-28 | 2019-09-24 | 华中科技大学 | The parallel full articulamentum method for interchanging data of deep neural network model and system |
CN106991999B (en) * | 2017-03-29 | 2020-06-02 | 北京小米移动软件有限公司 | Voice recognition method and device |
CN108804974B (en) * | 2017-04-27 | 2021-07-02 | 深圳鲲云信息科技有限公司 | Method and system for estimating and configuring resources of hardware architecture of target detection algorithm |
CN108229645B (en) * | 2017-04-28 | 2021-08-06 | 北京市商汤科技开发有限公司 | Convolution acceleration and calculation processing method and device, electronic equipment and storage medium |
CN107229969A (en) * | 2017-06-21 | 2017-10-03 | 郑州云海信息技术有限公司 | A kind of convolutional neural networks implementation method and device based on FPGA |
CN107451654B (en) * | 2017-07-05 | 2021-05-18 | 深圳市自行科技有限公司 | Acceleration operation method of convolutional neural network, server and storage medium |
CN107451653A (en) * | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Computational methods, device and the readable storage medium storing program for executing of deep neural network |
CN107451659B (en) * | 2017-07-27 | 2020-04-10 | 清华大学 | Neural network accelerator for bit width partition and implementation method thereof |
CN107622305A (en) * | 2017-08-24 | 2018-01-23 | 中国科学院计算技术研究所 | Processor and processing method for neutral net |
CN107689223A (en) * | 2017-08-30 | 2018-02-13 | 北京嘉楠捷思信息技术有限公司 | Audio identification method and device |
CN110245751B (en) * | 2017-08-31 | 2020-10-09 | 中科寒武纪科技股份有限公司 | GEMM operation method and device |
US10839286B2 (en) * | 2017-09-14 | 2020-11-17 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
CN107564522A (en) * | 2017-09-18 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of intelligent control method and device |
CN107656899A (en) * | 2017-09-27 | 2018-02-02 | 深圳大学 | A kind of mask convolution method and system based on FPGA |
CN107749044A (en) * | 2017-10-19 | 2018-03-02 | 珠海格力电器股份有限公司 | The pond method and device of image information |
CN109754062B (en) * | 2017-11-07 | 2024-05-14 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related product |
CN107844833A (en) * | 2017-11-28 | 2018-03-27 | 郑州云海信息技术有限公司 | A kind of data processing method of convolutional neural networks, device and medium |
CN108009631A (en) * | 2017-11-30 | 2018-05-08 | 睿视智觉(深圳)算法技术有限公司 | A kind of VGG-16 general purpose processing blocks and its control method based on FPGA |
CN110574371B (en) * | 2017-12-08 | 2021-12-21 | 百度时代网络技术(北京)有限公司 | Stereo camera depth determination using hardware accelerators |
CN109961133B (en) * | 2017-12-14 | 2020-04-24 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
CN108416422B (en) * | 2017-12-29 | 2024-03-01 | 国民技术股份有限公司 | FPGA-based convolutional neural network implementation method and device |
CN108388943B (en) * | 2018-01-08 | 2020-12-29 | 中国科学院计算技术研究所 | Pooling device and method suitable for neural network |
CN108154229B (en) * | 2018-01-10 | 2022-04-08 | 西安电子科技大学 | Image processing method based on FPGA (field programmable Gate array) accelerated convolutional neural network framework |
CN108362628A (en) * | 2018-01-11 | 2018-08-03 | 天津大学 | The n cell flow-sorting methods of flow cytometer are imaged based on polarizing diffraction |
CN109643336A (en) * | 2018-01-15 | 2019-04-16 | 深圳鲲云信息科技有限公司 | Artificial intelligence process device designs a model method for building up, system, storage medium, terminal |
CN109313723B (en) * | 2018-01-15 | 2022-03-15 | 深圳鲲云信息科技有限公司 | Artificial intelligence convolution processing method and device, readable storage medium and terminal |
WO2019136764A1 (en) * | 2018-01-15 | 2019-07-18 | 深圳鲲云信息科技有限公司 | Convolutor and artificial intelligent processing device applied thereto |
CN110178146B (en) * | 2018-01-15 | 2023-05-12 | 深圳鲲云信息科技有限公司 | Deconvolutor and artificial intelligence processing device applied by deconvolutor |
US11874898B2 (en) | 2018-01-15 | 2024-01-16 | Shenzhen Corerain Technologies Co., Ltd. | Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal |
US11568232B2 (en) * | 2018-02-08 | 2023-01-31 | Quanta Computer Inc. | Deep learning FPGA converter |
CN108108809B (en) * | 2018-03-05 | 2021-03-02 | 山东领能电子科技有限公司 | Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof |
CN108537330B (en) * | 2018-03-09 | 2020-09-01 | 中国科学院自动化研究所 | Convolution computing device and method applied to neural network |
CN108256636A (en) * | 2018-03-16 | 2018-07-06 | 成都理工大学 | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing |
CN108710892B (en) * | 2018-04-04 | 2020-09-01 | 浙江工业大学 | Cooperative immune defense method for multiple anti-picture attacks |
CN108615076B (en) * | 2018-04-08 | 2020-09-11 | 瑞芯微电子股份有限公司 | Deep learning chip-based data storage optimization method and device |
CN108470211B (en) * | 2018-04-09 | 2022-07-12 | 郑州云海信息技术有限公司 | Method and device for realizing convolution calculation and computer storage medium |
CN108520300A (en) * | 2018-04-09 | 2018-09-11 | 郑州云海信息技术有限公司 | A kind of implementation method and device of deep learning network |
CN110399976B (en) * | 2018-04-25 | 2022-04-05 | 华为技术有限公司 | Computing device and computing method |
CN108549935B (en) * | 2018-05-03 | 2021-09-10 | 山东浪潮科学研究院有限公司 | Device and method for realizing neural network model |
CN108595379A (en) * | 2018-05-08 | 2018-09-28 | 济南浪潮高新科技投资发展有限公司 | A kind of parallelization convolution algorithm method and system based on multi-level buffer |
CN108805270B (en) * | 2018-05-08 | 2021-02-12 | 华中科技大学 | Convolutional neural network system based on memory |
CN108805267B (en) * | 2018-05-28 | 2021-09-10 | 重庆大学 | Data processing method for hardware acceleration of convolutional neural network |
CN108764182B (en) * | 2018-06-01 | 2020-12-08 | 阿依瓦(北京)技术有限公司 | Optimized acceleration method and device for artificial intelligence |
CN108711429B (en) * | 2018-06-08 | 2021-04-02 | Oppo广东移动通信有限公司 | Electronic device and device control method |
CN109086879B (en) * | 2018-07-05 | 2020-06-16 | 东南大学 | Method for realizing dense connection neural network based on FPGA |
CN109032781A (en) * | 2018-07-13 | 2018-12-18 | 重庆邮电大学 | A kind of FPGA parallel system of convolutional neural networks algorithm |
CN109117949A (en) * | 2018-08-01 | 2019-01-01 | 南京天数智芯科技有限公司 | Flexible data stream handle and processing method for artificial intelligence equipment |
CN109036459B (en) * | 2018-08-22 | 2019-12-27 | 百度在线网络技术(北京)有限公司 | Voice endpoint detection method and device, computer equipment and computer storage medium |
CN109102070B (en) * | 2018-08-22 | 2020-11-24 | 地平线(上海)人工智能技术有限公司 | Preprocessing method and device for convolutional neural network data |
CN109214506B (en) * | 2018-09-13 | 2022-04-15 | 深思考人工智能机器人科技(北京)有限公司 | Convolutional neural network establishing device and method based on pixels |
US20200090046A1 (en) * | 2018-09-14 | 2020-03-19 | Huawei Technologies Co., Ltd. | System and method for cascaded dynamic max pooling in neural networks |
US20200090023A1 (en) * | 2018-09-14 | 2020-03-19 | Huawei Technologies Co., Ltd. | System and method for cascaded max pooling in neural networks |
CN109359732B (en) * | 2018-09-30 | 2020-06-09 | 阿里巴巴集团控股有限公司 | Chip and data processing method based on chip |
CN109376843B (en) * | 2018-10-12 | 2021-01-08 | 山东师范大学 | FPGA-based electroencephalogram signal rapid classification method, implementation method and device |
CN109146067B (en) * | 2018-11-19 | 2021-11-05 | 东北大学 | Policy convolution neural network accelerator based on FPGA |
CN109670578A (en) * | 2018-12-14 | 2019-04-23 | 北京中科寒武纪科技有限公司 | Neural network first floor convolution layer data processing method, device and computer equipment |
CN109711539B (en) * | 2018-12-17 | 2020-05-29 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN109800867B (en) * | 2018-12-17 | 2020-09-29 | 北京理工大学 | Data calling method based on FPGA off-chip memory |
CN109740748B (en) * | 2019-01-08 | 2021-01-08 | 西安邮电大学 | Convolutional neural network accelerator based on FPGA |
CN109784483B (en) * | 2019-01-24 | 2022-09-09 | 电子科技大学 | FD-SOI (field-programmable gate array-silicon on insulator) process-based binary convolution neural network in-memory computing accelerator |
CN109871939B (en) * | 2019-01-29 | 2021-06-15 | 深兰人工智能芯片研究院(江苏)有限公司 | Image processing method and image processing device |
CN109615067B (en) * | 2019-03-05 | 2019-05-21 | 深兰人工智能芯片研究院(江苏)有限公司 | A kind of data dispatching method and device of convolutional neural networks |
TWI696129B (en) * | 2019-03-15 | 2020-06-11 | 華邦電子股份有限公司 | Memory chip capable of performing artificial intelligence operation and operation method thereof |
CN110032374B (en) * | 2019-03-21 | 2023-04-07 | 深兰科技(上海)有限公司 | Parameter extraction method, device, equipment and medium |
CN110084363B (en) * | 2019-05-15 | 2023-04-25 | 电科瑞达(成都)科技有限公司 | Deep learning model acceleration method based on FPGA platform |
CN110223687B (en) * | 2019-06-03 | 2021-09-28 | Oppo广东移动通信有限公司 | Instruction execution method and device, storage medium and electronic equipment |
CN110209627A (en) * | 2019-06-03 | 2019-09-06 | 山东浪潮人工智能研究院有限公司 | A kind of hardware-accelerated method of SSD towards intelligent terminal |
CN110727634B (en) * | 2019-07-05 | 2021-10-29 | 中国科学院计算技术研究所 | Embedded intelligent computer system for object-side data processing |
CN110458279B (en) * | 2019-07-15 | 2022-05-20 | 武汉魅瞳科技有限公司 | FPGA-based binary neural network acceleration method and system |
CN110472442A (en) * | 2019-08-20 | 2019-11-19 | 厦门理工学院 | A kind of automatic detection hardware Trojan horse IP kernel |
TWI724515B (en) * | 2019-08-27 | 2021-04-11 | 聯智科創有限公司 | Machine learning service delivery method |
CN110619387B (en) * | 2019-09-12 | 2023-06-20 | 复旦大学 | Channel expansion method based on convolutional neural network |
CN110689088A (en) * | 2019-10-09 | 2020-01-14 | 山东大学 | CNN-based LIBS ore spectral data classification method and device |
CN110910434B (en) * | 2019-11-05 | 2023-05-12 | 东南大学 | Method for realizing deep learning parallax estimation algorithm based on FPGA (field programmable Gate array) high energy efficiency |
CN110991632B (en) * | 2019-11-29 | 2023-05-23 | 电子科技大学 | Heterogeneous neural network calculation accelerator design method based on FPGA |
CN110880038B (en) * | 2019-11-29 | 2022-07-01 | 中国科学院自动化研究所 | System for accelerating convolution calculation based on FPGA and convolution neural network |
CN111008629A (en) * | 2019-12-07 | 2020-04-14 | 怀化学院 | Cortex-M3-based method for identifying number of tip |
CN111310921B (en) * | 2020-03-27 | 2022-04-19 | 西安电子科技大学 | FPGA implementation method of lightweight deep convolutional neural network |
CN111667053B (en) * | 2020-06-01 | 2023-05-09 | 重庆邮电大学 | Forward propagation calculation acceleration method of convolutional neural network accelerator |
CN111832718B (en) * | 2020-06-24 | 2021-08-03 | 上海西井信息科技有限公司 | Chip architecture |
CN111860773B (en) * | 2020-06-30 | 2023-07-28 | 北京百度网讯科技有限公司 | Processing apparatus and method for information processing |
CN112508184B (en) * | 2020-12-16 | 2022-04-29 | 重庆邮电大学 | Design method of fast image recognition accelerator based on convolutional neural network |
CN113012689B (en) * | 2021-04-15 | 2023-04-07 | 成都爱旗科技有限公司 | Electronic equipment and deep learning hardware acceleration method |
CN113762491B (en) * | 2021-08-10 | 2023-06-30 | 南京工业大学 | Convolutional neural network accelerator based on FPGA |
CN114546484A (en) * | 2022-02-21 | 2022-05-27 | 山东浪潮科学研究院有限公司 | Deep convolution optimization method, system and device based on micro-architecture processor |
CN116718894B (en) * | 2023-06-19 | 2024-03-29 | 上饶市广强电子科技有限公司 | Circuit stability test method and system for corn lamp |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7882164B1 (en) * | 2004-09-24 | 2011-02-01 | University Of Southern California | Image convolution engine optimized for use in programmable gate arrays |
CN104035750A (en) * | 2014-06-11 | 2014-09-10 | 西安电子科技大学 | Field programmable gate array (FPGA)-based real-time template convolution implementing method |
CN105046681A (en) * | 2015-05-14 | 2015-11-11 | 江南大学 | Image salient region detecting method based on SoC |
CN105469039A (en) * | 2015-11-19 | 2016-04-06 | 天津大学 | Target identification system based on AER image sensor |
CN105491269A (en) * | 2015-11-24 | 2016-04-13 | 长春乙天科技有限公司 | High-fidelity video amplification method based on deconvolution image restoration |
CN105678379A (en) * | 2016-01-12 | 2016-06-15 | 腾讯科技(深圳)有限公司 | CNN processing method and device |
CN105678378A (en) * | 2014-12-04 | 2016-06-15 | 辉达公司 | Indirectly accessing sample data to perform multi-convolution operations in parallel processing system |
CN105740773A (en) * | 2016-01-25 | 2016-07-06 | 重庆理工大学 | Deep learning and multi-scale information based behavior identification method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9622041B2 (en) * | 2013-03-15 | 2017-04-11 | DGS Global Systems, Inc. | Systems, methods, and devices for electronic spectrum management |
-
2016
- 2016-07-30 CN CN201610615714.2A patent/CN106228240B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7882164B1 (en) * | 2004-09-24 | 2011-02-01 | University Of Southern California | Image convolution engine optimized for use in programmable gate arrays |
CN104035750A (en) * | 2014-06-11 | 2014-09-10 | 西安电子科技大学 | Field programmable gate array (FPGA)-based real-time template convolution implementing method |
CN105678378A (en) * | 2014-12-04 | 2016-06-15 | 辉达公司 | Indirectly accessing sample data to perform multi-convolution operations in parallel processing system |
CN105046681A (en) * | 2015-05-14 | 2015-11-11 | 江南大学 | Image salient region detecting method based on SoC |
CN105469039A (en) * | 2015-11-19 | 2016-04-06 | 天津大学 | Target identification system based on AER image sensor |
CN105491269A (en) * | 2015-11-24 | 2016-04-13 | 长春乙天科技有限公司 | High-fidelity video amplification method based on deconvolution image restoration |
CN105678379A (en) * | 2016-01-12 | 2016-06-15 | 腾讯科技(深圳)有限公司 | CNN processing method and device |
CN105740773A (en) * | 2016-01-25 | 2016-07-06 | 重庆理工大学 | Deep learning and multi-scale information based behavior identification method |
Non-Patent Citations (6)
Title |
---|
A Multistage Dataflow Implementation of a Deep Convolutional Neural Network Based on FPGA For High-Speed Object Recognition;Li, Ning 等;《2016 IEEE SOUTHWEST SYMPOSIUM ON IMAGE ANALYSIS AND INTERPRETATION》;20160308;第165-168页 * |
Design space exploration of FPGA-based Deep Convolutional Neural Networks;Mohammad Motamedi 等;《2016 21st Asia and South Pacific Design Automation Conference》;20160310;全文 * |
Fast Pipeline 128times128 pixel spiking convolution core for event-driven vision processing in FPGAs;Yousefzadeh, A 等;《2015 First International Conference on Event-Based Control, Communication and Signal Processing 》;20151231;全文 * |
一种新型2-D卷积器的FPGA实现;桑红石 等;《微电子学与计算机》;20110930;全文 * |
基于FPGA的图像卷积IP核的设计与实现;朱学亮 等;《微电子学与计算机》;20160630;第188-192页 * |
空间模板卷积滤波算法的FPGA实现新方法;李明 等;《计算机应用与软件》;20100831;第17-18页 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11880759B2 (en) | 2020-02-18 | 2024-01-23 | Stmicroelectronics S.R.L. | Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks |
US11836608B2 (en) | 2020-06-23 | 2023-12-05 | Stmicroelectronics S.R.L. | Convolution acceleration with embedded vector decompression |
Also Published As
Publication number | Publication date |
---|---|
CN106228240A (en) | 2016-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106228240B (en) | Deep convolution neural network implementation method based on FPGA | |
US11574195B2 (en) | Operation method | |
Chaurasia et al. | Linknet: Exploiting encoder representations for efficient semantic segmentation | |
EP3407266B1 (en) | Artificial neural network calculating device and method for sparse connection | |
CN109934331B (en) | Apparatus and method for performing artificial neural network forward operations | |
CN111459877B (en) | Winograd YOLOv2 target detection model method based on FPGA acceleration | |
CN106991477B (en) | Artificial neural network compression coding device and method | |
CN107340993B (en) | Arithmetic device and method | |
CN108108809B (en) | Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof | |
US20180018555A1 (en) | System and method for building artificial neural network architectures | |
CN108763159A (en) | To arithmetic accelerator before a kind of LSTM based on FPGA | |
CN110991631A (en) | Neural network acceleration system based on FPGA | |
US11775832B2 (en) | Device and method for artificial neural network operation | |
Gupta et al. | FPGA implementation of simplified spiking neural network | |
Liau et al. | Fire SSD: Wide fire modules based single shot detector on edge device | |
CN111583094A (en) | Image pulse coding method and system based on FPGA | |
CN113762493A (en) | Neural network model compression method and device, acceleration unit and computing system | |
CN109145107A (en) | Subject distillation method, apparatus, medium and equipment based on convolutional neural networks | |
CN109685208B (en) | Method and device for thinning and combing acceleration of data of neural network processor | |
Sommer et al. | Efficient hardware acceleration of sparsely active convolutional spiking neural networks | |
WO2022028232A1 (en) | Device and method for executing lstm neural network operation | |
Al Maashri et al. | A hardware architecture for accelerating neuromorphic vision algorithms | |
Nathan et al. | Skeletonnetv2: A dense channel attention blocks for skeleton extraction | |
CN112988229B (en) | Convolutional neural network resource optimization configuration method based on heterogeneous computation | |
JP2022541712A (en) | Neural network training method, video recognition method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |