CN106228240B - Deep convolution neural network implementation method based on FPGA - Google Patents

Deep convolution neural network implementation method based on FPGA Download PDF

Info

Publication number
CN106228240B
CN106228240B CN201610615714.2A CN201610615714A CN106228240B CN 106228240 B CN106228240 B CN 106228240B CN 201610615714 A CN201610615714 A CN 201610615714A CN 106228240 B CN106228240 B CN 106228240B
Authority
CN
China
Prior art keywords
convolution
calculation
matrix
fpga
floating point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610615714.2A
Other languages
Chinese (zh)
Other versions
CN106228240A (en
Inventor
王展雄
周光朕
冯瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201610615714.2A priority Critical patent/CN106228240B/en
Publication of CN106228240A publication Critical patent/CN106228240A/en
Application granted granted Critical
Publication of CN106228240B publication Critical patent/CN106228240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Neurology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of digital image processing and pattern recognition. In particular to a deep convolution neural network implementation method based on FPGA. The hardware platform realized by the invention is a XilinxZYNQ-7030 programmable SoC on chip, and an FPGA and an ARM Cortex A9 processor are arranged in the hardware platform. Firstly, loading trained network model parameters to an FPGA end, then preprocessing input data at an ARM end, transmitting a result to the FPGA end, realizing convolution calculation and down-sampling of a deep convolution neural network at the FPGA end, forming data feature vectors and transmitting the data feature vectors to the ARM end to finish feature classification calculation. The invention realizes the convolution calculation part with the highest complexity in the deep convolution neural network model by utilizing the rapid parallel processing and the high-efficiency calculation characteristic of extremely low power consumption of the FPGA, greatly improves the algorithm efficiency and reduces the power consumption on the premise of ensuring the algorithm accuracy.

Description

Deep convolution neural network implementation method based on FPGA
Technical Field
The invention belongs to the technical field of digital image processing and pattern recognition, and particularly relates to a method for realizing a deep convolutional neural network model on an FPGA hardware platform.
Background
Under the condition of high-speed development of the current computer technology and the internet, the data scale is increased explosively, and intelligent analysis and processing of mass data become the key points for effectively utilizing the data value. The artificial intelligence technology is an effective means for finding valuable information from mass data, and breakthrough progress is made in the application fields of computer vision, speech recognition, natural language processing and the like in recent years. One representative example is a deep learning algorithm model based on a deep convolutional neural network.
Convolutional Neural Networks (CNN) were inspired by neuroscience research. After the evolution of more than 20 years, remarkable theoretical research and practical application achievements are obtained in the fields of pattern recognition, man-machine confrontation and the like, and in a famous man-machine go confrontation game, an artificial intelligent system AlphaGo based on a CNN + Monte Carlo search tree algorithm overcomes world go champion plum stone with the advantage of 4:1 score. A typical CNN algorithm model consists of two parts: a feature extractor and a classifier. The feature extractor is responsible for generating low-dimensional feature vectors of input data and has good robustness on the data. The vector is classified as input data of a classifier (usually based on a traditional artificial neural network), and a classification result of the input data is obtained.
In the implementation of the convolutional neural network algorithm model, the convolution calculation accounts for 90% of the calculation amount of the whole algorithm model[1]Therefore, efficient calculation of the convolutional layer is the key for greatly improving the calculation efficiency of the CNN algorithm model, and realization of convolutional calculation through hardware acceleration is an effective way.
At present, a GPU cluster is generally used in the industry to realize a deep learning algorithm model, a deep neural network model is realized through large-scale parallel computing, and remarkable high-efficiency and high-performance results are obtained, however, the large-scale application of the GPU is restricted by the high power consumption of the GPU, and the deep learning algorithm model becomes a bottleneck of the practical popularization and application of the deep convolutional neural network algorithm model. The FPGA has the advantages of high-performance parallel computation and ultra-low power consumption, and the realization of a deep learning algorithm model on the FPGA is a necessary development direction in the field.
At present, there are three main schemes for implementing CNN by using FPGA:
(1) a soft core CPU is used for realizing a control part and is matched with an FPGA to realize algorithm acceleration;
(2) a hard core ARM Cortex A9 CPU embedded in a hard core SoC is used for realizing a control part and is matched with an FPGA (field programmable gate array) to realize algorithm acceleration;
(3) and the algorithm acceleration is realized by matching the cloud server with the FPGA.
The three schemes have advantages and disadvantages, and different acceleration schemes can be selected according to different application occasions.
In the deep convolutional neural network, convolutional layer calculation occupies more than 90% of calculated amount, and is a key link in the whole network model after the beginning, and the calculation efficiency directly influences the performance of the realization of the model algorithm. However, it is difficult to implement convolution calculation on FPGA, which mainly includes the following aspects:
(1) the deep learning algorithm model is basically still in the academic research stage at present, and large-scale industrial application also has a lot of algorithms and model optimization works, so the algorithm model needs to be continuously optimized to adapt to different application scenes, and deep learning theory and algorithm need to be deeply understood;
(2) the research and development of the FPGA are based on a bottom hardware language, the method is suitable for the condition that an algorithm model is relatively stable, and the continuously-changed deep learning algorithm model brings great difficulty for the realization of the algorithm model on the FPGA;
(3) implementing deep convolutional neural networks on FPGAs requires a great deal of experience in the engineering implementation of FPGAs. The running clock frequency of the FPGA and the output delay (Latency) of the module such as the multiplier used are contradictory to each other, and the higher the clock frequency is, the longer the output delay of the module is, and the lower the clock frequency is, the shorter the output delay of the module is. The parameters of the relative balance need to be found by manual experimentation with engineering experience.
Disclosure of Invention
The invention aims to provide a method for realizing a deep convolutional neural network model with high efficiency and low power consumption, so as to solve the problems of high power consumption and low efficiency of the current deep learning model based on a GPU or a CPU.
The invention optimizes the FPGA hardware design, effectively reduces the resource consumption and can realize the deep convolution neural network model on a low-end FPGA hardware platform.
The method for realizing the deep convolutional neural network model provided by the invention realizes that a hardware platform is a XilinxZYNQ-7030 programmable on-chip SoC, and an FPGA and an ARM Cortex A9 processor are arranged in the hardware platform. Firstly, loading trained network model parameters to an FPGA end, then preprocessing input data at an ARM end, transmitting a result to the FPGA end, realizing convolution calculation and down-sampling of a deep convolution neural network at the FPGA end, forming data feature vectors and transmitting the data feature vectors to the ARM end to finish feature classification calculation. The method specifically comprises the following 4 processes: model parameter loading process, input data preprocessing operation process, convolution and downsampling calculation process and classification calculation process:
1. the model parameter loading process comprises the following steps:
(1) training a deep convolutional neural network model offline;
(2) loading training model parameters at the ARM end;
(3) transmitting the model parameters to the FPGA;
2. the input data preprocessing operation process comprises the following steps:
(1) normalization processing;
(2) transmitting the processing result to the FPGA;
(3) storing the data to a Block RAM at an FPGA end;
3. the convolution and downsampling calculation process is as follows:
(1) initializing a convolution assembly line;
(2) performing convolution calculation;
(3) performing pooling downsampling calculation;
(4) reinitializing the convolution assembly line, and performing multilayer convolution downsampling calculation;
4. the classification calculation process comprises the following steps:
(1) transmitting the feature vector back to the ARM end;
(2) calculating through a classification model;
(3) and outputting a classification result.
The specific introduction is as follows:
step 1, loading training model parameters
(1) Loading parameters of a deep convolutional neural network model trained offline at an ARM end;
(2) transmitting the parameters of the training model to an FPGA end;
(3) the FPGA end is cached by FIFO and then stored in a Block RAM (random Access memory);
step 2, preprocessing a deep convolution neural network model
(1) Normalizing the input data to meet the requirement of model convolution operation;
(2) transmitting the ARM end normalized data to the FPGA end by using an APB bus;
(3) the FPGA end stores the normalized data into a Block RAM after FIFO cache;
step 3, convolution and down-sampling calculation
And designing a deep pipeline implementation mode aiming at the computation of the convolution layer with the maximum computation amount in the deep convolutional neural network model. The network model is provided with H convolutional layers and pooling layers. The H (H =1,2, …, H) th convolutional layer is input as T m × m floating point number (32-bit) matrices, output as S (m-n +1) × (m-n +1) floating point number (32-bit) matrices, convolution kernel is K n × n floating point number (32-bit) matrices (n is less than or equal to m), input data sliding window scale is n × n, horizontal sliding step is 1, and vertical sliding step is 1.
(1) Initializing a convolution operation pipeline
Defining n +1 data cache registers P0,P1,…,Pn-1,PnEach register holds m data. Wherein n registers (P)(i-1)%(n+1)+0,P(i-1)%(n+1)+1,…,P(i-1)%(n+1)+n-1) Storing the data of the ith (i =1,2, …, T) sub-matrix (n × m) of the T (T =1,2, …, m-n +1) input data matrix, wherein% represents the remainder if (i-1))%(n+1)+x>n, then (i-1)% (n +1) + x =0, (i-1)% (n +1) + x +1=1, …, wherein x =0,1, …, n-1. If n is<m,P(i-1)%(n+1)+nThe register stores the (i + n) th row of data in the input data matrix, and parallel initialization is realized in the convolution calculation process, so that the idle period of the FPGA is reduced, and the calculation efficiency is improved.
Defining 1 convolution kernel matrix buffer register W, and storing the kth (K =1,2, …, K) n × n convolution kernel matrix weight data.
(2) H convolutional layer calculation
And completing convolution calculation of the t input data matrix and the k convolution kernel of the h convolution layer of the network, and activating the calculation result through a Sigmoid function.
Specifically, while performing convolution calculation each time, the i + n-th data buffer register P is initialized(i-1)%(n+1)+nAnd the data is used as the buffer input data of the (i +1) th sub-matrix convolution calculation in the convolution to realize the circular convolution.
Constructing a Sigmoid function at an FPGA end through a Floating-point IP (IP) core to realize the activation of a convolution calculation result; the expression of the Sigmoid function is:
Figure DEST_PATH_IMAGE002
. The method comprises the following specific steps:
as described above, the input data is an m × m floating point matrix, the convolution kernel is an n × n floating point matrix, the sliding window scale is n × n, the horizontal sliding step is 1, and the vertical sliding step is 1, then the convolution result is an (m-n +1) x (m-n +1) floating point matrix, offset b11 (offline training model parameter) is added to each element of the matrix, and after activation by using a Sigmoid function, the result is an (m-n +1) x (m-n +1) floating point matrix, which is stored in the Block RAM.
And after 1 convolution calculation, re-initializing the convolution kernel matrix cache register W, performing the next convolution calculation and the reciprocating circular convolution calculation, wherein the calculation result is S (m-n +1) x (m-n +1) floating point number matrixes, and storing the floating point number matrixes into the Block RAM.
(3) H pooling level calculation
And realizing pooling calculation of the h convolution layer calculation result, wherein the result is S [ (m-n +1)/2] × [ (m-n +1)/2] floating point number matrixes, and storing the S floating point number matrixes into a Block RAM. The method comprises the following specific steps: and (3) setting the scale of a data sliding window of the convolution calculation result to be 2 multiplied by 2 and the step length to be 2, and realizing pooling by adopting an average down-sampling method, namely adding 2 multiplied by 2 floating-point number matrixes one by one, and averaging the calculation result to obtain S [ (m-n +1)/2] × [ (m-n +1)/2] floating-point number matrixes as an input matrix for the h +1 th convolution layer calculation.
Step 4, classified calculation
And transmitting the convolution calculation and pooling calculation results back to the ARM end for classification operation. The method comprises the following specific steps: the FPGA end transmits a convolution pooling calculation result matrix in the Block RAM to the ARM end through FIFO cache and an APB bus, and the ARM end completes data classification calculation by utilizing Softmax operation to obtain and output a classification result of input data.
The method of the invention has the following main characteristics:
(1) a deep convolutional neural network model is realized on a low-end FPGA;
(2) the convolution calculation in the deep convolution neural network model is accelerated by utilizing a pipeline calculation mode;
(3) the control chip is realized by adopting an Soc embedded ARM processor, has the characteristics of small volume, low power consumption and high efficiency, and can be widely applied to the field of embedded systems.
The invention realizes the convolution calculation part with the highest complexity in the deep convolution neural network model by utilizing the rapid parallel processing and the high-efficiency calculation characteristic of extremely low power consumption of the FPGA, and greatly improves the algorithm efficiency on the premise of ensuring the algorithm accuracy. Compared with the traditional method for realizing the deep convolutional neural network based on the CPU or the GPU, the method disclosed by the invention has the advantages that the algorithm calculation speed is effectively improved, meanwhile, the power consumption is greatly reduced, and the problems of long operation time or large power consumption caused by adopting the CPU or the GPU to realize the deep convolutional neural network are effectively solved.
Drawings
FIG. 1 is a flow diagram of an FPGA-based deep convolutional neural network implementation.
Fig. 2MNIST database (section).
Fig. 3 is a schematic diagram of matrix transposition.
FIG. 4 is a schematic diagram of a pipeline computation.
FIG. 5 is a schematic diagram of convolution calculations.
FIG. 6 is a diagram of a deep convolutional neural network architecture.
Fig. 7 is a schematic view of the downsampling calculation.
FIG. 8 shows simulation results of a deep convolutional neural network model based on FPGA.
Fig. 9 measured classification results (MNIST database) of number "7".
Detailed Description
The following explains the concrete implementation of the handwritten character recognition algorithm by utilizing a deep convolutional neural network model on an FPGA hardware platform by using the method of the invention in combination with the attached drawings. (the deep convolutional neural network model consists of an input layer I, a first convolutional layer C1, a first downsampled layer S1, a second convolutional layer C2, a second downsampled layer S2, and a full-link layer Softmax. the input picture size is 28 × 28, the first convolutional layer contains 1 convolutional kernel of size 5 × 5, and the second convolutional layer contains 3 convolutional kernels of size 5 × 5).
The specific operation steps implemented on the FPGA by using the handwritten character recognition algorithm of the deep convolutional neural network model are shown in fig. 1.
1. Loading trained model parameters
Firstly, referring to a CNN function in a deep Learn Toolbox-master, and carrying out certain modification (rewriting a convolution function, changing the number of layers of a neural network into 5 layers, one input layer, two convolution layers and two down-sampling layers, wherein the first convolution layer has 1 convolution kernel with the size of 5 multiplied by 5, the second convolution layer has 3 convolution kernels with the size of 5 multiplied by 5, the sliding step length of the two down-sampling layers is 2, the sliding window has 2 multiplied by 2 matrixes, and the training times are set as 10), training a deep convolution neural network by using Matlab, then loading trained weight parameters and offset parameters at an ARM end, finally transmitting the trained model parameters to an FPGA end, caching the model parameters through an FIFO, and storing the model parameters in a Block RAM.
2. Pretreatment of
The MNIST handwriting image shown in FIG. 2 is read into memory, normalized by dividing each pixel by 255, and transposed as shown in FIG. 3.
3. Transmitting the pre-processing result to the FPGA
And transmitting the preprocessing result to an FPGA end through an APB bus on ZYNQ-7030 Soc, and storing the preprocessing result in a Block RAM after FIFO cache.
4. Initializing a convolution operation pipeline
As shown in FIG. 4, 6 data cache registers P are defined0,P1,P2,P3,P4,P5Each register may hold 28 floating-point data. Of which 5 registers (P)(i-1)%(5+1)+0,P(i-1)%(5+1)+1,…,P(i-1)%(5+1)+5-1) Data of the ith (i =1,2, …, 24) submatrix (5 × 28) of the input image matrix is stored, where% represents the remainder of the drawing if (i-1)% (5+1) + x>5, (i-1)% (5+1) + x =0, (i-1)% (5+1) + x +1=1, …, wherein x =0,1, …, 4. P(i-1)%(5+1)+5The register stores the (i + 5) th line of data in the input image matrix.
Defining 1 convolution kernel matrix buffer register W, storing 1 convolution layer 1, 5 x 5 convolution kernel matrix weight data.
5. Performing the 1 st convolution layer calculation
And completing convolution calculation of the 1 st convolution layer input image matrix of the network and the 1 st convolution kernel of the 1 st convolution layer, and activating a calculation result through a Sigmoid function.
Initializing the (i + 5) th data buffer register P while performing convolution calculation(i-1)%(5+1)+5And the circular convolution is realized as the buffer input data of the convolution calculation of the (i +1) th sub-matrix in the convolution, as shown in the figure 5.
And constructing a Sigmoid function at the FPGA end through a Floating-point IP (IP) core to realize the activation of the convolution calculation result. The Sigmoid function is expressed as:
Figure 217358DEST_PATH_IMAGE002
the method comprises the following specific steps:
as described above, the input image is a 28 × 28 floating point matrix, the convolution kernel is a 5 × 5 floating point matrix, the sliding window scale is 5 × 5, the horizontal sliding step is 1, and the vertical sliding step is 1, so that the convolution result is a 24 × 24 floating point matrix, each element of the matrix is added with an offset b11 (offline training model parameter), and after activation by using a Sigmoid function, the result is a 24 × 24 floating point matrix, and the floating point matrix is stored in the Block RAM.
After 1 convolution calculation, the calculation result is 1 matrix of 24 × 24 floating point numbers, and the matrix is stored in the Block RAM.
6. Perform the 1 st pooling level calculation
The pooling calculation of the 1 st convolution layer calculation result is realized, as shown in fig. 6, the result is 1 12 × 12 floating-point number matrix, and is stored in the Block RAM. The method comprises the following specific steps: the scale of the convolution calculation result data sliding window is 2 × 2, the step size is 2, and the pooling is realized by adopting an average down-sampling method, that is, 2 × 2 floating-point number matrixes are added one by one, the calculation result is averaged to obtain 1 12 × 12 floating-point number matrix which is used as the input matrix of the 2 nd convolution layer calculation, as shown in fig. 7.
7. Reinitializing a convolution pipeline
As shown in FIG. 4, 6 data cache registers P are reinitialized0,P1,P2,P3,P4,P5Each register holds 12 floating-point data. Of which 5 registers (P)(i-1)%(5+1)+0,P(i-1)%(5+1)+1,…,P(i-1)%(5+1)+5-1) Data of the ith (i =1,2, …, 8) submatrix (5 × 12) of the input matrix is stored, wherein% represents the remainder of the drawing, if (i-1)% (5+1) + x>5, (i-1)% (5+1) + x =0, (i-1)% (5+1) + x +1=1, …, wherein x =0,1, …, 4. P(i-1)%(5+1)+5The register stores the (i + 5) th row of data in the input matrix.
And reinitializing the convolution kernel matrix cache register W to store the 1 st 5 x 5 convolution kernel matrix weight data of the 2 nd convolution layer.
8. Performing 2 nd convolution layer calculation
And completing convolution calculation of the 2 nd convolution layer input data matrix of the network and the 1 st convolution kernel of the 2 nd convolution layer, and activating a calculation result through a Sigmoid function.
And reinitializing the convolution kernel matrix cache register W, storing the 2 nd 5 multiplied by 5 convolution kernel matrix weight data of the 2 nd convolution layer, completing the convolution calculation of the 2 nd convolution layer input data matrix and the 2 nd convolution kernel of the 2 nd convolution layer of the network, and activating the calculation result through a Sigmoid function.
And reinitializing a convolution kernel matrix cache register W, storing the 3 rd 5 multiplied by 5 convolution kernel matrix weight data of the 2 nd convolution layer, completing the convolution calculation of the 2 nd convolution layer input data matrix and the 2 nd convolution layer 3 rd convolution kernel of the network, and activating the calculation result through a Sigmoid function.
Initializing the (i + 5) th data buffer register P while performing each convolution calculation(i-1)%(5+1)+5And the circular convolution is realized as the buffer input data of the convolution calculation of the (i +1) th sub-matrix in the convolution, as shown in the figure 5.
The method comprises the following specific steps: as described above, the input image is a 12 × 12 floating point matrix, the convolution kernel is 3 5 × 5 floating point matrices, the sliding window scale is 5 × 5, the horizontal sliding step is 1, and the vertical sliding step is 1, the convolution result is 3 floating point matrices of 8 × 8, each element of the 3 matrices is added with offsets b21, b22, and b23 (offline training model parameters), and after activation by using a Sigmoid function, the result is 3 floating point matrices of 8 × 8, and the floating point matrices are stored in the Block RAM.
After 2 times of convolution calculation, the calculation result is 3 matrixes of 8 multiplied by 8 floating point numbers, and the matrixes are stored in a Block RAM.
9. Perform the 2 nd pooling level calculation
The pooling of the 2 nd convolutional layer calculation result is achieved, as shown in fig. 6, the result is 3 matrices of 4 × 4 floating-point numbers, and is stored in the Block RAM. The method comprises the following specific steps: the scale of the convolution calculation result data sliding window is 2 × 2, the step size is 2, the pooling is realized by adopting an average down-sampling method, that is, 2 × 2 floating point number matrixes are added one by one, the calculation result is averaged, and 3 4 × 4 floating point number matrixes are obtained and used as the input matrix of the Softmax layer, as shown in fig. 7.
10. Classification calculation
And transmitting the convolution calculation and pooling calculation results back to the ARM end for classification operation. The method comprises the following specific steps: the FPGA end transmits a convolution pooling calculation result matrix in the Block RAM to the ARM end through FIFO cache and APB bus, and the ARM end completes data classification calculation by utilizing Softmax operation to obtain and output a classification result of an input picture.
The simulation result of the method for processing the digital picture '7' in the MNIST database is shown in FIG. 8.
The measured classification results of the digital picture "7" in the MNIST database processed by the above method are shown in FIG. 9.
Reference to the literature
[1]Cong J, Xiao B. Minimizing Computation in Convolutional NeuralNetworks[M]// Artificial Neural Networks and Machine Learning – ICANN 2014.Springer International Publishing, 2014:33-7.
[2]Farabet C, Poulet C, Han J Y, et al. CNP: An FPGA-based processorfor Convolutional Networks[J]. International Conference on Field ProgrammableLogic&Applications, 2009:32-37.
[3]Gokhale V, Jin J, Dundar A, et al. A 240 G-ops/s MobileCoprocessor for Deep Neural Networks[C]// IEEE Embedded Vision Workshop.2014:696-701.
[4]Zhang C, Li P, Sun G, et al. Optimizing FPGA-based AcceleratorDesign for Deep Convolutional Neural Networks[C]// Acm/sigda InternationalSymposium. 2015:161-170.
[5]Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classificationwith Deep Convolutional Neural Networks[J]. Advances in Neural InformationProcessing Systems, 2012, 25(2):2012.
[6]Farabet C, Martini B, Corda B, et al. NeuFlow: A runtimereconfigurable dataflow processor for vision[J]. 2011, 9(6):109-116.
[7]Matai J, Irturk A, Kastner R. Design and Implementation of anFPGA-Based Real-Time Face Recognition System[C]// IEEE, InternationalSymposium on Field-Programmable Custom Computing Machines. 2011:97-100.
[8]Sankaradas M, Jakkula V, Cadambi S, et al. A Massively ParallelCoprocessor for Convolutional Neural Networks[C]// IEEE InternationalConference on Application-Specific Systems, Architectures and Processors.IEEE Computer Society, 2009:53-60.。

Claims (1)

1. A deep convolution neural network implementation method based on FPGA is characterized by comprising the following specific steps:
step 1, loading training model parameters
(1) Loading parameters of a deep convolutional neural network model trained offline at an ARM end;
(2) transmitting the parameters of the training model to an FPGA end;
(3) the FPGA end is cached by FIFO and then stored in the block random access memory;
step 2, preprocessing a deep convolution neural network model
(1) Normalizing the input data to meet the requirement of model convolution operation;
(2) transmitting the ARM end normalized data to the FPGA end by using an APB bus;
(3) the FPGA end stores the normalized data into a block random access memory after FIFO cache;
step 3, convolution and down-sampling calculation
Setting a network model to have H convolutional layers and H pooling layers, wherein the input of the H convolutional layer is a T m multiplied by m 32-bit floating point number matrix, and H is 1,2, … and H; the output is S (m-n +1) x (m-n +1) 32-bit floating point number matrixes, the convolution kernel is K n x n 32-bit floating point number matrixes, n is less than or equal to m, the input data sliding window scale is n x n, the transverse sliding step length is 1, and the longitudinal sliding step length is 1;
(1) initializing a convolution operation pipeline
Defining n +1 data cache registers P0,P1,…,Pn-1,PnEach register storing m data, n registers P(i-1)%(n+1)+0,P(i-1)%(n+1)+1,…,P(i-1)%(n+1)+n-1The ith sub-matrix n × m data of the tth input data matrix is stored, T is 1,2, …, TI ═ 1,2, …, m-n + 1; the remainder is expressed in% if (i-1)% (n +1) + x>n, then (i-1)% (n +1) + x ═ 0, (i-1)% (n +1) + x +1 ═ 1, …, where x ═ 0,1, …, n-1; if n is<m,P(i-1)%(n+1)+nThe register stores the (i + n) th row of data in the input data matrix, and parallel initialization is realized in the convolution calculation process, so that the idle period of the FPGA is reduced, and the calculation efficiency is improved;
defining 1 convolution kernel matrix cache register W, storing the kth n × n convolution kernel matrix weight data, wherein K is 1,2, … and K;
(2) h convolutional layer calculation
Completing convolution calculation of the t input data matrix and the k convolution kernel of the h convolution layer of the network, and activating a calculation result through a Sigmoid function;
initializing the data buffer register P while performing each convolution calculation(i-1)%(n+1)+nThe data is used as cache input data for the convolution calculation of the (i +1) th sub-matrix in the convolution to realize the cyclic convolution;
a Sigmoid function is constructed at the FPGA end through a floating point IP core, so that the activation of a convolution calculation result is realized, and the expression of the Sigmoid function is as follows:
Figure FDA0002538557580000011
the method comprises the following specific steps:
as described above, the input data is an m × m floating point number matrix, the convolution kernel is an n × n floating point number matrix, the scale of the sliding window is n × n, the transverse sliding step is 1, and the longitudinal sliding step is 1, so that the convolution result is an (m-n +1) x (m-n +1) floating point number matrix, each element of the matrix is added with an offset b11, that is, an offline training model parameter, and after being activated by using a Sigmoid function, the result is an (m-n +1) x (m-n +1) floating point number matrix, and the floating point number matrix is stored in a Block RAM;
after 1 convolution calculation, reinitializing a convolution kernel matrix cache register W, performing the next convolution calculation and reciprocating circular convolution calculation, wherein the calculation result is S (m-n +1) x (m-n +1) floating point number matrixes, and storing the floating point number matrixes into a Block RAM;
(3) h pooling level calculation
Realizing pooling calculation of the h convolution layer calculation result, wherein the result is S [ (m-n +1)/2] × [ (m-n +1)/2] floating point number matrixes, and storing the S floating point number matrixes into a Block RAM; the method comprises the following specific steps: setting the scale of a convolution calculation result data sliding window to be 2 multiplied by 2 and the step length to be 2, realizing pooling by adopting an average down-sampling method, namely adding 2 multiplied by 2 floating point number matrixes one by one, and taking the average value of the calculation result to obtain S [ (m-n +1)/2] x [ (m-n +1)/2] floating point number matrixes as an input matrix for h +1 th convolution layer calculation;
step 4, classified calculation
Transmitting the convolution calculation and pooling calculation results back to the ARM end for classification operation; the method comprises the following specific steps: the FPGA end transmits a convolution pooling calculation result matrix in the Block RAM to the ARM end through FIFO cache and an APB bus, and the ARM end completes data classification calculation by utilizing Softmax operation to obtain and output a classification result of input data.
CN201610615714.2A 2016-07-30 2016-07-30 Deep convolution neural network implementation method based on FPGA Active CN106228240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610615714.2A CN106228240B (en) 2016-07-30 2016-07-30 Deep convolution neural network implementation method based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610615714.2A CN106228240B (en) 2016-07-30 2016-07-30 Deep convolution neural network implementation method based on FPGA

Publications (2)

Publication Number Publication Date
CN106228240A CN106228240A (en) 2016-12-14
CN106228240B true CN106228240B (en) 2020-09-01

Family

ID=57536621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610615714.2A Active CN106228240B (en) 2016-07-30 2016-07-30 Deep convolution neural network implementation method based on FPGA

Country Status (1)

Country Link
CN (1) CN106228240B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11836608B2 (en) 2020-06-23 2023-12-05 Stmicroelectronics S.R.L. Convolution acceleration with embedded vector decompression
US11880759B2 (en) 2020-02-18 2024-01-23 Stmicroelectronics S.R.L. Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks

Families Citing this family (100)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018060268A (en) * 2016-10-03 2018-04-12 株式会社日立製作所 Recognition device and learning system
KR20180073314A (en) * 2016-12-22 2018-07-02 삼성전자주식회사 Convolutional neural network system and operation method thererof
CN106650691A (en) * 2016-12-30 2017-05-10 北京旷视科技有限公司 Image processing method and image processing device
CN106529517B (en) * 2016-12-30 2019-11-01 北京旷视科技有限公司 Image processing method and image processing equipment
US20180189229A1 (en) 2017-01-04 2018-07-05 Stmicroelectronics S.R.L. Deep convolutional network heterogeneous architecture
CN108269224B (en) 2017-01-04 2022-04-01 意法半导体股份有限公司 Reconfigurable interconnect
CN106909970B (en) * 2017-01-12 2020-04-21 南京风兴科技有限公司 Approximate calculation-based binary weight convolution neural network hardware accelerator calculation device
CN106875011B (en) * 2017-01-12 2020-04-17 南京风兴科技有限公司 Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof
CN106682702A (en) * 2017-01-12 2017-05-17 张亮 Deep learning method and system
CN108304922B (en) * 2017-01-13 2020-12-15 华为技术有限公司 Computing device and computing method for neural network computing
WO2018137177A1 (en) * 2017-01-25 2018-08-02 北京大学 Method for convolution operation based on nor flash array
CN106779060B (en) * 2017-02-09 2019-03-08 武汉魅瞳科技有限公司 A kind of calculation method for the depth convolutional neural networks realized suitable for hardware design
CN106875012B (en) * 2017-02-09 2019-09-20 武汉魅瞳科技有限公司 A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA
TWI607389B (en) * 2017-02-10 2017-12-01 耐能股份有限公司 Pooling operation device and method for convolutional neural network
CN106991474B (en) * 2017-03-28 2019-09-24 华中科技大学 The parallel full articulamentum method for interchanging data of deep neural network model and system
CN106991999B (en) * 2017-03-29 2020-06-02 北京小米移动软件有限公司 Voice recognition method and device
CN108804974B (en) * 2017-04-27 2021-07-02 深圳鲲云信息科技有限公司 Method and system for estimating and configuring resources of hardware architecture of target detection algorithm
CN108229645B (en) * 2017-04-28 2021-08-06 北京市商汤科技开发有限公司 Convolution acceleration and calculation processing method and device, electronic equipment and storage medium
CN107229969A (en) * 2017-06-21 2017-10-03 郑州云海信息技术有限公司 A kind of convolutional neural networks implementation method and device based on FPGA
CN107451654B (en) * 2017-07-05 2021-05-18 深圳市自行科技有限公司 Acceleration operation method of convolutional neural network, server and storage medium
CN107451653A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Computational methods, device and the readable storage medium storing program for executing of deep neural network
CN107451659B (en) * 2017-07-27 2020-04-10 清华大学 Neural network accelerator for bit width partition and implementation method thereof
CN107622305A (en) * 2017-08-24 2018-01-23 中国科学院计算技术研究所 Processor and processing method for neutral net
CN107689223A (en) * 2017-08-30 2018-02-13 北京嘉楠捷思信息技术有限公司 Audio identification method and device
CN110245751B (en) * 2017-08-31 2020-10-09 中科寒武纪科技股份有限公司 GEMM operation method and device
US10839286B2 (en) * 2017-09-14 2020-11-17 Xilinx, Inc. System and method for implementing neural networks in integrated circuits
CN107564522A (en) * 2017-09-18 2018-01-09 郑州云海信息技术有限公司 A kind of intelligent control method and device
CN107656899A (en) * 2017-09-27 2018-02-02 深圳大学 A kind of mask convolution method and system based on FPGA
CN107749044A (en) * 2017-10-19 2018-03-02 珠海格力电器股份有限公司 The pond method and device of image information
CN109754062B (en) * 2017-11-07 2024-05-14 上海寒武纪信息科技有限公司 Execution method of convolution expansion instruction and related product
CN107844833A (en) * 2017-11-28 2018-03-27 郑州云海信息技术有限公司 A kind of data processing method of convolutional neural networks, device and medium
CN108009631A (en) * 2017-11-30 2018-05-08 睿视智觉(深圳)算法技术有限公司 A kind of VGG-16 general purpose processing blocks and its control method based on FPGA
CN110574371B (en) * 2017-12-08 2021-12-21 百度时代网络技术(北京)有限公司 Stereo camera depth determination using hardware accelerators
CN109961133B (en) * 2017-12-14 2020-04-24 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN108416422B (en) * 2017-12-29 2024-03-01 国民技术股份有限公司 FPGA-based convolutional neural network implementation method and device
CN108388943B (en) * 2018-01-08 2020-12-29 中国科学院计算技术研究所 Pooling device and method suitable for neural network
CN108154229B (en) * 2018-01-10 2022-04-08 西安电子科技大学 Image processing method based on FPGA (field programmable Gate array) accelerated convolutional neural network framework
CN108362628A (en) * 2018-01-11 2018-08-03 天津大学 The n cell flow-sorting methods of flow cytometer are imaged based on polarizing diffraction
CN109643336A (en) * 2018-01-15 2019-04-16 深圳鲲云信息科技有限公司 Artificial intelligence process device designs a model method for building up, system, storage medium, terminal
CN109313723B (en) * 2018-01-15 2022-03-15 深圳鲲云信息科技有限公司 Artificial intelligence convolution processing method and device, readable storage medium and terminal
WO2019136764A1 (en) * 2018-01-15 2019-07-18 深圳鲲云信息科技有限公司 Convolutor and artificial intelligent processing device applied thereto
CN110178146B (en) * 2018-01-15 2023-05-12 深圳鲲云信息科技有限公司 Deconvolutor and artificial intelligence processing device applied by deconvolutor
US11874898B2 (en) 2018-01-15 2024-01-16 Shenzhen Corerain Technologies Co., Ltd. Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal
US11568232B2 (en) * 2018-02-08 2023-01-31 Quanta Computer Inc. Deep learning FPGA converter
CN108108809B (en) * 2018-03-05 2021-03-02 山东领能电子科技有限公司 Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof
CN108537330B (en) * 2018-03-09 2020-09-01 中国科学院自动化研究所 Convolution computing device and method applied to neural network
CN108256636A (en) * 2018-03-16 2018-07-06 成都理工大学 A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing
CN108710892B (en) * 2018-04-04 2020-09-01 浙江工业大学 Cooperative immune defense method for multiple anti-picture attacks
CN108615076B (en) * 2018-04-08 2020-09-11 瑞芯微电子股份有限公司 Deep learning chip-based data storage optimization method and device
CN108470211B (en) * 2018-04-09 2022-07-12 郑州云海信息技术有限公司 Method and device for realizing convolution calculation and computer storage medium
CN108520300A (en) * 2018-04-09 2018-09-11 郑州云海信息技术有限公司 A kind of implementation method and device of deep learning network
CN110399976B (en) * 2018-04-25 2022-04-05 华为技术有限公司 Computing device and computing method
CN108549935B (en) * 2018-05-03 2021-09-10 山东浪潮科学研究院有限公司 Device and method for realizing neural network model
CN108595379A (en) * 2018-05-08 2018-09-28 济南浪潮高新科技投资发展有限公司 A kind of parallelization convolution algorithm method and system based on multi-level buffer
CN108805270B (en) * 2018-05-08 2021-02-12 华中科技大学 Convolutional neural network system based on memory
CN108805267B (en) * 2018-05-28 2021-09-10 重庆大学 Data processing method for hardware acceleration of convolutional neural network
CN108764182B (en) * 2018-06-01 2020-12-08 阿依瓦(北京)技术有限公司 Optimized acceleration method and device for artificial intelligence
CN108711429B (en) * 2018-06-08 2021-04-02 Oppo广东移动通信有限公司 Electronic device and device control method
CN109086879B (en) * 2018-07-05 2020-06-16 东南大学 Method for realizing dense connection neural network based on FPGA
CN109032781A (en) * 2018-07-13 2018-12-18 重庆邮电大学 A kind of FPGA parallel system of convolutional neural networks algorithm
CN109117949A (en) * 2018-08-01 2019-01-01 南京天数智芯科技有限公司 Flexible data stream handle and processing method for artificial intelligence equipment
CN109036459B (en) * 2018-08-22 2019-12-27 百度在线网络技术(北京)有限公司 Voice endpoint detection method and device, computer equipment and computer storage medium
CN109102070B (en) * 2018-08-22 2020-11-24 地平线(上海)人工智能技术有限公司 Preprocessing method and device for convolutional neural network data
CN109214506B (en) * 2018-09-13 2022-04-15 深思考人工智能机器人科技(北京)有限公司 Convolutional neural network establishing device and method based on pixels
US20200090046A1 (en) * 2018-09-14 2020-03-19 Huawei Technologies Co., Ltd. System and method for cascaded dynamic max pooling in neural networks
US20200090023A1 (en) * 2018-09-14 2020-03-19 Huawei Technologies Co., Ltd. System and method for cascaded max pooling in neural networks
CN109359732B (en) * 2018-09-30 2020-06-09 阿里巴巴集团控股有限公司 Chip and data processing method based on chip
CN109376843B (en) * 2018-10-12 2021-01-08 山东师范大学 FPGA-based electroencephalogram signal rapid classification method, implementation method and device
CN109146067B (en) * 2018-11-19 2021-11-05 东北大学 Policy convolution neural network accelerator based on FPGA
CN109670578A (en) * 2018-12-14 2019-04-23 北京中科寒武纪科技有限公司 Neural network first floor convolution layer data processing method, device and computer equipment
CN109711539B (en) * 2018-12-17 2020-05-29 中科寒武纪科技股份有限公司 Operation method, device and related product
CN109800867B (en) * 2018-12-17 2020-09-29 北京理工大学 Data calling method based on FPGA off-chip memory
CN109740748B (en) * 2019-01-08 2021-01-08 西安邮电大学 Convolutional neural network accelerator based on FPGA
CN109784483B (en) * 2019-01-24 2022-09-09 电子科技大学 FD-SOI (field-programmable gate array-silicon on insulator) process-based binary convolution neural network in-memory computing accelerator
CN109871939B (en) * 2019-01-29 2021-06-15 深兰人工智能芯片研究院(江苏)有限公司 Image processing method and image processing device
CN109615067B (en) * 2019-03-05 2019-05-21 深兰人工智能芯片研究院(江苏)有限公司 A kind of data dispatching method and device of convolutional neural networks
TWI696129B (en) * 2019-03-15 2020-06-11 華邦電子股份有限公司 Memory chip capable of performing artificial intelligence operation and operation method thereof
CN110032374B (en) * 2019-03-21 2023-04-07 深兰科技(上海)有限公司 Parameter extraction method, device, equipment and medium
CN110084363B (en) * 2019-05-15 2023-04-25 电科瑞达(成都)科技有限公司 Deep learning model acceleration method based on FPGA platform
CN110223687B (en) * 2019-06-03 2021-09-28 Oppo广东移动通信有限公司 Instruction execution method and device, storage medium and electronic equipment
CN110209627A (en) * 2019-06-03 2019-09-06 山东浪潮人工智能研究院有限公司 A kind of hardware-accelerated method of SSD towards intelligent terminal
CN110727634B (en) * 2019-07-05 2021-10-29 中国科学院计算技术研究所 Embedded intelligent computer system for object-side data processing
CN110458279B (en) * 2019-07-15 2022-05-20 武汉魅瞳科技有限公司 FPGA-based binary neural network acceleration method and system
CN110472442A (en) * 2019-08-20 2019-11-19 厦门理工学院 A kind of automatic detection hardware Trojan horse IP kernel
TWI724515B (en) * 2019-08-27 2021-04-11 聯智科創有限公司 Machine learning service delivery method
CN110619387B (en) * 2019-09-12 2023-06-20 复旦大学 Channel expansion method based on convolutional neural network
CN110689088A (en) * 2019-10-09 2020-01-14 山东大学 CNN-based LIBS ore spectral data classification method and device
CN110910434B (en) * 2019-11-05 2023-05-12 东南大学 Method for realizing deep learning parallax estimation algorithm based on FPGA (field programmable Gate array) high energy efficiency
CN110991632B (en) * 2019-11-29 2023-05-23 电子科技大学 Heterogeneous neural network calculation accelerator design method based on FPGA
CN110880038B (en) * 2019-11-29 2022-07-01 中国科学院自动化研究所 System for accelerating convolution calculation based on FPGA and convolution neural network
CN111008629A (en) * 2019-12-07 2020-04-14 怀化学院 Cortex-M3-based method for identifying number of tip
CN111310921B (en) * 2020-03-27 2022-04-19 西安电子科技大学 FPGA implementation method of lightweight deep convolutional neural network
CN111667053B (en) * 2020-06-01 2023-05-09 重庆邮电大学 Forward propagation calculation acceleration method of convolutional neural network accelerator
CN111832718B (en) * 2020-06-24 2021-08-03 上海西井信息科技有限公司 Chip architecture
CN111860773B (en) * 2020-06-30 2023-07-28 北京百度网讯科技有限公司 Processing apparatus and method for information processing
CN112508184B (en) * 2020-12-16 2022-04-29 重庆邮电大学 Design method of fast image recognition accelerator based on convolutional neural network
CN113012689B (en) * 2021-04-15 2023-04-07 成都爱旗科技有限公司 Electronic equipment and deep learning hardware acceleration method
CN113762491B (en) * 2021-08-10 2023-06-30 南京工业大学 Convolutional neural network accelerator based on FPGA
CN114546484A (en) * 2022-02-21 2022-05-27 山东浪潮科学研究院有限公司 Deep convolution optimization method, system and device based on micro-architecture processor
CN116718894B (en) * 2023-06-19 2024-03-29 上饶市广强电子科技有限公司 Circuit stability test method and system for corn lamp

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7882164B1 (en) * 2004-09-24 2011-02-01 University Of Southern California Image convolution engine optimized for use in programmable gate arrays
CN104035750A (en) * 2014-06-11 2014-09-10 西安电子科技大学 Field programmable gate array (FPGA)-based real-time template convolution implementing method
CN105046681A (en) * 2015-05-14 2015-11-11 江南大学 Image salient region detecting method based on SoC
CN105469039A (en) * 2015-11-19 2016-04-06 天津大学 Target identification system based on AER image sensor
CN105491269A (en) * 2015-11-24 2016-04-13 长春乙天科技有限公司 High-fidelity video amplification method based on deconvolution image restoration
CN105678379A (en) * 2016-01-12 2016-06-15 腾讯科技(深圳)有限公司 CNN processing method and device
CN105678378A (en) * 2014-12-04 2016-06-15 辉达公司 Indirectly accessing sample data to perform multi-convolution operations in parallel processing system
CN105740773A (en) * 2016-01-25 2016-07-06 重庆理工大学 Deep learning and multi-scale information based behavior identification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9622041B2 (en) * 2013-03-15 2017-04-11 DGS Global Systems, Inc. Systems, methods, and devices for electronic spectrum management

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7882164B1 (en) * 2004-09-24 2011-02-01 University Of Southern California Image convolution engine optimized for use in programmable gate arrays
CN104035750A (en) * 2014-06-11 2014-09-10 西安电子科技大学 Field programmable gate array (FPGA)-based real-time template convolution implementing method
CN105678378A (en) * 2014-12-04 2016-06-15 辉达公司 Indirectly accessing sample data to perform multi-convolution operations in parallel processing system
CN105046681A (en) * 2015-05-14 2015-11-11 江南大学 Image salient region detecting method based on SoC
CN105469039A (en) * 2015-11-19 2016-04-06 天津大学 Target identification system based on AER image sensor
CN105491269A (en) * 2015-11-24 2016-04-13 长春乙天科技有限公司 High-fidelity video amplification method based on deconvolution image restoration
CN105678379A (en) * 2016-01-12 2016-06-15 腾讯科技(深圳)有限公司 CNN processing method and device
CN105740773A (en) * 2016-01-25 2016-07-06 重庆理工大学 Deep learning and multi-scale information based behavior identification method

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A Multistage Dataflow Implementation of a Deep Convolutional Neural Network Based on FPGA For High-Speed Object Recognition;Li, Ning 等;《2016 IEEE SOUTHWEST SYMPOSIUM ON IMAGE ANALYSIS AND INTERPRETATION》;20160308;第165-168页 *
Design space exploration of FPGA-based Deep Convolutional Neural Networks;Mohammad Motamedi 等;《2016 21st Asia and South Pacific Design Automation Conference》;20160310;全文 *
Fast Pipeline 128times128 pixel spiking convolution core for event-driven vision processing in FPGAs;Yousefzadeh, A 等;《2015 First International Conference on Event-Based Control, Communication and Signal Processing 》;20151231;全文 *
一种新型2-D卷积器的FPGA实现;桑红石 等;《微电子学与计算机》;20110930;全文 *
基于FPGA的图像卷积IP核的设计与实现;朱学亮 等;《微电子学与计算机》;20160630;第188-192页 *
空间模板卷积滤波算法的FPGA实现新方法;李明 等;《计算机应用与软件》;20100831;第17-18页 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11880759B2 (en) 2020-02-18 2024-01-23 Stmicroelectronics S.R.L. Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks
US11836608B2 (en) 2020-06-23 2023-12-05 Stmicroelectronics S.R.L. Convolution acceleration with embedded vector decompression

Also Published As

Publication number Publication date
CN106228240A (en) 2016-12-14

Similar Documents

Publication Publication Date Title
CN106228240B (en) Deep convolution neural network implementation method based on FPGA
US11574195B2 (en) Operation method
Chaurasia et al. Linknet: Exploiting encoder representations for efficient semantic segmentation
EP3407266B1 (en) Artificial neural network calculating device and method for sparse connection
CN109934331B (en) Apparatus and method for performing artificial neural network forward operations
CN111459877B (en) Winograd YOLOv2 target detection model method based on FPGA acceleration
CN106991477B (en) Artificial neural network compression coding device and method
CN107340993B (en) Arithmetic device and method
CN108108809B (en) Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof
US20180018555A1 (en) System and method for building artificial neural network architectures
CN108763159A (en) To arithmetic accelerator before a kind of LSTM based on FPGA
CN110991631A (en) Neural network acceleration system based on FPGA
US11775832B2 (en) Device and method for artificial neural network operation
Gupta et al. FPGA implementation of simplified spiking neural network
Liau et al. Fire SSD: Wide fire modules based single shot detector on edge device
CN111583094A (en) Image pulse coding method and system based on FPGA
CN113762493A (en) Neural network model compression method and device, acceleration unit and computing system
CN109145107A (en) Subject distillation method, apparatus, medium and equipment based on convolutional neural networks
CN109685208B (en) Method and device for thinning and combing acceleration of data of neural network processor
Sommer et al. Efficient hardware acceleration of sparsely active convolutional spiking neural networks
WO2022028232A1 (en) Device and method for executing lstm neural network operation
Al Maashri et al. A hardware architecture for accelerating neuromorphic vision algorithms
Nathan et al. Skeletonnetv2: A dense channel attention blocks for skeleton extraction
CN112988229B (en) Convolutional neural network resource optimization configuration method based on heterogeneous computation
JP2022541712A (en) Neural network training method, video recognition method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant