CN109086867A - A kind of convolutional neural networks acceleration system based on FPGA - Google Patents
A kind of convolutional neural networks acceleration system based on FPGA Download PDFInfo
- Publication number
- CN109086867A CN109086867A CN201810710069.1A CN201810710069A CN109086867A CN 109086867 A CN109086867 A CN 109086867A CN 201810710069 A CN201810710069 A CN 201810710069A CN 109086867 A CN109086867 A CN 109086867A
- Authority
- CN
- China
- Prior art keywords
- module
- convolutional neural
- neural networks
- submodule
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 93
- 230000001133 acceleration Effects 0.000 title claims abstract description 37
- 230000004913 activation Effects 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 37
- 238000004364 calculation method Methods 0.000 claims abstract description 36
- 238000007781 pre-processing Methods 0.000 claims abstract description 20
- 230000006870 function Effects 0.000 claims description 34
- 238000000034 method Methods 0.000 claims description 28
- 239000010410 layer Substances 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 23
- 230000005540 biological transmission Effects 0.000 claims description 20
- 239000000872 buffer Substances 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000003860 storage Methods 0.000 claims description 11
- 230000001537 neural effect Effects 0.000 claims description 8
- 230000002123 temporal effect Effects 0.000 claims description 6
- 238000012805 post-processing Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 4
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 239000011229 interlayer Substances 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 2
- 241000208340 Araliaceae Species 0.000 claims 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 2
- 235000008434 ginseng Nutrition 0.000 claims 2
- 230000003139 buffering effect Effects 0.000 claims 1
- 230000014759 maintenance of location Effects 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 101150086656 dim1 gene Proteins 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005429 filling process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005336 cracking Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011056 performance test Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
Abstract
The convolutional neural networks acceleration system based on FPGA that the invention discloses a kind of, the convolutional neural networks on FPGA are accelerated based on OpenCL programming framework, which includes data preprocessing module, Data Post module, convolutional neural networks computing module, data memory module and network model configuration module;Wherein convolutional neural networks computing module includes convolutional calculation submodule, activation primitive computational submodule, pond computational submodule and connects computational submodule entirely;The acceleration system provided by the invention can be arranged in use according to the hardware resource situation of FPGA calculates degree of parallelism to be adapted to different FPGA and different convolutional neural networks, efficient parallel streamlined mode convolutional neural networks can be run on FPGA, and system power dissipation can be effectively reduced and greatly improve the processing speed of convolutional neural networks, meet requirement of real-time.
Description
Technical field
The invention belongs to neural computing technical fields, and in particular to a kind of convolutional neural networks based on FPGA add
Speed system.
Background technique
With the continuous maturation of depth learning technology, convolutional neural networks are widely used in computer vision, voice is known
Not, the fields such as natural language processing, and good effect is achieved in the practical application scenes such as Face datection, speech recognition
Fruit.In recent years, due to scale go from strength to strength can training dataset and the neural network structure constantly brought forth new ideas, convolutional Neural net
The accuracy of network and performance have all obtained significant raising, but as convolutional neural networks network structure becomes to become increasingly complex,
Requirement in practical application scene to high real-time, low cost is higher and higher, the calculating of the hardware for running neural network
The requirement of ability and energy consumption is also higher and higher.
FPGA has the characteristics that computing resource is abundant, flexibility is higher and energy efficiency is high, and and conventional digital circuits
System is compared, and is had many advantages, such as programmable, high integration, high speed and high reliability, has constantly been attempted for accelerans net
Network.OpenCL is the Heterogeneous Computing language based on traditional C language, may operate at the OverDrive Processor ODPs such as CPU, GPU, PFGA and DSP
On, language abstraction hierarchy with higher, programmer, which need not understand hardware circuit and low-level details, can develop high-performance
Application program, greatly reduce the complexity of programming process.
In November, 2012, the powerful parallel architecture and Open CL that altera corp is formally proposed collection FPGA are simultaneously
Row programming model is familiar with C language for carrying out the software development kit (SDK) of Open CL exploitation on FPGA in one
Programmer cracking can adapt to and rest under Open CL high-level language environment using the software development kit realize high property
Energy, low-power consumption, high effect exploitation FPGA application method.Convolution is accelerated on FPGA using Altera OpenCL SDK
The calculating of neural network, external accelerator of the FPGA as host can be realized the association of host Yu outside FPGA accelerator
With work.
Summary of the invention
Extremely a little less in aiming at the above defects or improvement requirements of the prior art, the present invention provides one kind to be based on
The convolutional neural networks acceleration system of FPGA, its object is to calculate structure to existing convolutional neural networks to be adjusted again
It is whole pipelining between the concurrency and each computation layer in calculating process sufficiently to excavate convolutional neural networks, it improves
The processing speed of convolutional neural networks.
To achieve the above object, according to one aspect of the present invention, a kind of convolutional neural networks based on FPGA are provided
Acceleration system, including data preprocessing module, convolutional neural networks computing module, Data Post module, data memory module
With network model configuration module;Wherein, data preprocessing module, convolutional neural networks computing module and data post-processing module
It is realized based on FPGA, data memory module is realized based on the piece external storage of FPGA, piece of the network model configuration module based on FPGA
Upper storage is realized;
Wherein, data preprocessing module is used to read phase from data memory module according to the calculation stages being presently in
The convolution nuclear parameter and input feature vector figure answered, and convolution nuclear parameter and input feature vector figure are pre-processed: convolution kernel is tieed up by 4
Model parameter desires to make money or profit to input feature vector and is unfolded with sliding window and is replicated at 3 dimensions, so that the local feature in sliding window
Figure is corresponded with convolution nuclear parameter, obtains the convolution kernel argument sequence convenient for directly calculating and local characteristic pattern series;Pre- place
Convolutional neural networks computing module is sent by the convolution nuclear parameter handled well and input feature vector figure after the completion of reason;
Network model configuration module is used to carry out parameter configuration to convolutional neural networks computing module;The convolutional Neural
Convolutional layer, activation primitive layer, pond layer and full articulamentum in convolutional neural networks is independently arranged by network query function module, is led to
Parameter configuration is crossed to construct a variety of different network structures, and according to configuration parameter to the volume received from data preprocessing module
Product nuclear parameter and input feature vector figure carry out convolution, intensify, the interlayer stream treatment in pond and full connection calculating, then for simultaneously in layer
Row processing;Processing result is sent to Data Post module;
Data Post module is used to the output data of convolutional neural networks computing module being written to data memory module
In;
Data memory module is used to store the model parameter caffemodel of convolutional neural networks, intermediate features figure calculates
As a result and final calculation result, data memory module module carry out data exchange by PCIe interface and external host.
Preferably, above-mentioned convolutional neural networks acceleration system, convolutional neural networks computing module include convolutional calculation
Submodule, activation primitive computational submodule, pond computational submodule and computational submodule is connected entirely, convolutional neural networks calculate
It is connected between these submodules of inside modules according to the predefined network model configuration parameter of network model configuration module;
After convolutional neural networks computing module receives convolution nuclear parameter and the characteristic pattern of data preprocessing module transmission,
Start to be handled according to each submodule that configuration parameter is organized, after sending the result to data after the completion of processing
Manage module;
Specifically, convolutional calculation submodule carries out convolutional calculation using the convolution nuclear parameter and characteristic pattern of input, by result
It is sent to activation primitive computational submodule;
Activation primitive computational submodule is selected according to the predefined activation primitive configuration parameter of network model parameter configuration module
Activation primitive is selected, activation calculating is carried out to characteristic pattern using selected activation primitive, is after the completion sent out result according to parameter configuration
It is sent in pond computational submodule or full connection computational submodule;
Pond computational submodule is used to carry out pondization to received characteristic pattern to calculate, and according to network model configuration module
Pond result is sent full connection computing module by predefined configuration parameter, or is sent directly to Data Post module;
Full connection computational submodule is used to carry out received characteristic pattern full connection to calculate, and sends full connection result to
Data Post module.
Preferably, above-mentioned convolutional neural networks acceleration system, data preprocessing module include data transmission mould
Block, convolution nuclear parameter pretreatment submodule and characteristic pattern pre-process submodule;
Wherein, data transmission module is for controlling feature figure and convolution nuclear parameter in data memory module and convolution mind
Through the transmission between network query function module;Convolution nuclear parameter pretreatment submodule is for resetting convolution nuclear parameter, being arranged
Processing;Characteristic pattern pretreatment submodule for characteristic pattern is unfolded, replicate and arrangement processing.
Preferably, above-mentioned convolutional neural networks acceleration system, data memory module include convolution nuclear parameter storage
Module, characteristic pattern sub-module stored, convolution nuclear parameter sub-module stored store submodule for storing convolution nuclear parameter, characteristic pattern
Block is used to store the temporal aspect figure in input feature vector figure and calculating process;These sub-module storeds are preferably connected by with FPGA
The DDR memory connect divides, in OpenCL programming framework, data memory module by as global memory come using.
Preferably, above-mentioned convolutional neural networks acceleration system, data transmission module include DDR controller, data
Transfer bus and memory buffers;
Wherein, DDR controller is used to control data transmission of the data among DDR and FPGA, data transmission bus connection
DDR and FPGA is the channel of data transmission;Reading of the memory buffers for temporal data, reduction FPGA to DDR, improves data
Transmission speed.
Preferably, above-mentioned convolutional neural networks acceleration system, convolutional calculation submodule include one or more matrixes
Multiplication computational submodule;The quantity of matrix multiplication computational submodule is set by the predefined configuration parameter of network model configuration module
It is fixed;Calculating between each matrix multiplication computational submodule executes parallel;
Matrix multiplication computational submodule accelerates operation using Winograd minimum filtering algorithm, obtains list for calculating
Matrix multiplication between a convolution kernel and corresponding local feature figure.
Preferably, above-mentioned convolutional neural networks acceleration system, activation primitive computational submodule include activation primitive choosing
Select submodule, Sigmoid function computational submodule, Tanh function computational submodule and ReLU function computational submodule;
Activation primitive select submodule respectively with Sigmoid function computational submodule, Tanh function computational submodule and
ReLU function computational submodule is connected, and the data of characteristic pattern are sent to one in these three computational submodules;
Wherein, activation primitive selection submodule is used to set the activation calculation of characteristic pattern in convolutional neural networks;
Sigmoid function computational submodule is used to carry out the calculating of Sigmoid function;Tanh function computational submodule is used
In the calculating for carrying out Tanh function;ReLU function computational submodule is used to carry out the calculating of ReLU function.
Preferably, above-mentioned convolutional neural networks acceleration system, pond computational submodule include by two FPGA on pieces
Store the Double buffer constituted;
For the temporal aspect diagram data in storage pool calculating process, buffer size is by network model parameter configuration
The predefined network configuration parameters setting of module, the buffer size of different pond layers is different, is tied by this double-buffer area
Structure realizes table tennis read-write operation, realizes the stream treatment of pondization calculating.
Preferably, above-mentioned convolutional neural networks acceleration system, network model parameter configuration module are deposited by FPGA on piece
Storage is realized, for storing network model configuration parameter, size, convolutional calculation submodule including network inputs characteristic pattern
The parameter of pond window size in the size and number of middle convolution nuclear parameter, pond computational submodule, full connection computational submodule
Scale calculates degree of parallelism;Data in network model parameter configuration module are preferably previously written before system starting.
Preferably, above-mentioned convolutional neural networks acceleration system, convolutional neural networks computing module is by convolutional calculation
Module, activation primitive computational submodule, pond computational submodule, full connection computational submodule are according to network model configuration parameter
It cascades, is carried out data transmission between these submodules using OpenCL Channel, the calculating in these submodules is parallel
It executes, the calculating between these submodules is that flowing water carries out.
The above-mentioned convolutional neural networks acceleration system based on FPGA provided by the invention, in conjunction with convolutional neural networks model
The advantage of design feature and fpga chip feature and OpenCL programming framework, to existing convolutional neural networks calculate structure into
Row is readjusted and designs corresponding module, sufficiently excavate concurrency of the convolutional neural networks in calculating process and
It is pipelining between each computation layer, it is allowed to more be matched with the design feature of FPGA, rationally efficiently utilizes the calculating of FPGA design
Resource improves the processing speed of convolutional neural networks.In general, contemplated above technical scheme and existing through the invention
There is technology to compare, can achieve the following beneficial effects:
(1) the convolutional neural networks acceleration system provided by the invention based on FPGA, utilizes each layer of convolutional neural networks
Estimated performance is devised suitable for a kind of pipeline processes, the system architecture of parallel computation;By data preprocessing module, convolution mind
A pipeline organization is formed through network query function module and data post-processing module;After data preprocessing module and data
Processing module controls the data between memory module and computing module and transmits, and convolution nuclear parameter and characteristic pattern pass sequentially through flowing water
Three big module in cable architecture completes reading data, data calculate and the fluvial processes of data storage;And by convolutional Neural net
Convolutional layer, activation primitive layer, pond layer and full articulamentum in network are respectively designed to individual computing module, are matched by parameter
It sets to construct a variety of different network structures;It is many small to split into the processing of each submodule of convolutional neural networks
Treatment process, the data of the corresponding submodule of each layer can all undergo reading data, data processing, data storage etc. different
Stage forms the pipeline organization for being similar to computer instruction assembly line form;Allow calculating in neural net layer simultaneously
Row executes, the calculating of interlayer can be executed with flowing water, can effectively improve the processing speed of convolutional neural networks.
(2) the convolutional neural networks acceleration system provided by the invention based on FPGA, based in convolutional neural networks calculating
Convolution nuclear parameter, the low associate feature of data between local feature figure, in the parallel computation structure of convolutional calculation submodule
In, there are the data of convolution kernel window corresponding with input feature vector figure to be calculated when calculating every time, under this framework, due to volume
Calculating data between product core are not associated with, and can carry out multiple calculation processings parallel;And in traditional convolutional calculation process
In, the data of input feature vector figure are obtained by way of sliding window, and it is corresponding that sliding window slides acquisition on characteristic pattern
Numerical value in convolution window removes sliding window and in parallel computation structure of the invention directly by original sliding window
Data drawout in mouthful forms multiple data blocks, and when calculating directly inputs corresponding data block, and this mode is by multiple numbers
It is calculated simultaneously with convolution kernel according to block, further improves processing speed.
(3) the convolutional neural networks acceleration system based on FPGA provided through the invention is, it can be achieved that in convolutional Neural net
If data enter pond computational submodule in the calculating process of network, the calculating of part pondization can be carried out;Due to multiple convolution
The calculating of core is parallel, it is possible to while generating the partial results on passage portion, that is to say, that pond computational submodule
Part input produced;The calculating of pond computational submodule is all with sliding window as convolutional calculation submodule
It is calculated for unit, therefore can be started after the data in some window in the computational submodule of pond all obtain
Pond, which is operable without etc. after the completion of all calculating of convolutional calculations submodule, to be started pondization again and operates;Convolutional calculation submodule
Data on multiple channels can be generated simultaneously, and channel is not associated with interchannel when pondization calculating, so pondization calculates son
Calculating in module on each channel can carry out parallel, and the processing speed of convolutional neural networks is greatly improved.
(4) the convolutional neural networks acceleration system provided by the invention based on FPGA, network model parameter is configurable, makes
Degree of parallelism in the structure and network query function of network model is set with configuration file so that different types of network model and
The FPGA of different computing capabilitys can run convolutional neural networks by parameter configuration.
(5) the convolutional neural networks acceleration system provided by the invention based on FPGA, meter of the preferred embodiment in convolutional layer
Winograd minimum filtering algorithm is used during calculating, and can play the role of accelerating convolutional calculation;
Ping-pong buffer is used in the calculating process of pond layer, can be played and be accelerated pondization to calculate and reduce memory space
The effect used;
The method that batch calculates is used in the calculating of full articulamentum, can reach to reduce and outside is deposited in calculating process
The purpose of the access in space is stored up, and uses segmentation and calculates, the matrix multiplication operation of simplified higher-dimension can be played the role of, mentioned
Processing speed is risen, the requirement to FPGA hardware operational capability is reduced;
Each computing module in convolutional neural networks is realized using OpenCL kernel program, can reduce development difficulty.
Detailed description of the invention
Fig. 1 is that the framework of one embodiment of the convolutional neural networks acceleration system provided by the invention based on FPGA shows
It is intended to;
Fig. 2 is the processing schematic of the data preprocessing module in embodiment;
Fig. 3 is the processing schematic of the convolutional calculation submodule in embodiment;
Fig. 4 is the processing schematic of the activation primitive computational submodule in embodiment;
Fig. 5 is the processing schematic of the pond computational submodule in embodiment;
Fig. 6 is the processing schematic of the full connection computational submodule in embodiment;
Fig. 7 is the processing schematic of the Data Post module in embodiment;
Fig. 8 is the process flow diagram of the acceleration system in embodiment.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments,
The present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only used to explain this hair
It is bright, it is not intended to limit the present invention.In addition, technology involved in the various embodiments of the present invention described below is special
Sign can be combined with each other as long as they do not conflict with each other.
Referring to Fig.1, one embodiment of the convolutional neural networks acceleration system provided by the invention based on FPGA includes number
Data preprocess module, convolutional neural networks computing module, Data Post module, data memory module and network model configuration
Module;
Wherein, the input terminal of data preprocessing module is connected with data memory module, convolutional neural networks computing module
The output end of input terminal and data preprocessing module, the input terminal and convolutional neural networks computing module of Data Post module
Output end be connected, the input terminal of data memory module is connected with the output end of Data Post module;Convolutional neural networks meter
Module is calculated also to be connected with network model configuration module;
Wherein, data preprocessing module is used to read phase from data memory module according to the calculation stages being presently in
The convolution nuclear parameter and input feature vector figure answered, and convolution nuclear parameter and input feature vector figure are pre-processed, convolution kernel is tieed up by 4
Parameter is rearranged into 3 dimensions, and is desired to make money or profit to input feature vector and spread out and replicated with sliding window, so that in sliding window
Local feature figure and convolution nuclear parameter correspond, obtain convenient for the convolution kernel argument sequence and local feature that directly calculate
Figure series sends convolutional neural networks for the convolution nuclear parameter handled well and input feature vector figure after the completion of pretreatment and calculates mould
Block;
Network model configuration module is used to carry out parameter configuration to convolutional neural networks computing module;Convolutional neural networks
Computing module be used for according to configuration parameter to received from data preprocessing module convolution nuclear parameter and input feature vector figure carry out weight
Row's processing, and Data Post module is sent by processing result;
Its convolutional neural networks computing module includes convolutional calculation submodule, activation primitive computational submodule, pondization calculating
Submodule and full connection computational submodule, according to network mould between these submodules inside convolutional neural networks computing module
Type configuration module predefined network model configuration parameter connects;
Data Post module is used to the output data of convolutional neural networks computing module being written to data memory module
In;
Data memory module is used to store the model parameter caffemodel of convolutional neural networks, intermediate features figure calculates
As a result and final calculation result, the module carry out data exchange by PCIe interface and external host.
Referring to Fig. 2, data preprocessing module reads convolution nuclear parameter and input feature vector figure from data memory module, reads
It takes when convolution kernel and is according to PARALLEL_KERNEL size of parameter predefined in model parameter configuration module reading
k*k*CiConvolution kernel, wherein CiIndicate the port number of input feature vector figure.Start to carry out sequence to convolution kernel after reading in convolution kernel
Change operation, i.e., it will be having a size of k*k*Ci* it is k*k* (C that the four-dimensional convolution kernel of PARALLEL_KERNEL, which is arranged in size,i*
PARALLEL_KERNEL three dimensional form).
It is first H*W*C by size when handling input feature vector figureiCharacteristic pattern read in, then according on characteristic pattern
The size of sliding window and moving step length carry out characteristic pattern drawout, the size of the characteristic pattern after expansion be ((W-k)/
stride+1)*((H-k)/stride+1)*Ci。
It is (PARALLEL_FEATURE_W) * according to configuration parameter interception size size after the expansion of input feature vector figure
(PARALLEL_FEATURE_H)*CiPartial Feature figure, the characteristic pattern of interception is replicated more parts, makes its quantity and convolution kernel
Quantity it is identical, finally obtained size be (PARALLEL_FEATURE_W) * (PARALLEL_FEATURE_H) *
(Ci* PARALLEL_KERNEL) characteristic pattern so that the calculating of multiple convolution kernels and characteristic pattern can be parallel.In convolution kernel and
After the completion of characteristic pattern processing, send convolution nuclear parameter after processing at convolutional neural networks computing module with characteristic pattern
Reason.
Process flow reference Fig. 3 of the convolutional calculation submodule of convolutional neural networks computing module, the input of the module are
Predefined relevant configuration in the convolution nuclear parameter and characteristic pattern and network model configuration module that data preprocessing module generates
Parameter.Pretreated convolution kernel and characteristic pattern are three-dimensional matrice, and port number is all PARALLEL_KERNEL*Ci;It will be every
Convolution kernel and characteristic pattern on one channel are separately input in different OpenCL computing units use Winograd Matrix Multiplication
Method module carries out two-dimensional matrix multiplying, and the calculating between OpenCL computing unit can carry out parallel, and calculated result is
A length of (PARALLEL_FEATURE_W/k), width are (PARALLEL_FEATURE_H/k), port number is (Ci*PARALLEL_
KERNEL characteristic pattern).The characteristic pattern of input generates the part output of the convolutional layer after the processing of convolutional calculation submodule
Characteristic pattern, part output characteristic pattern can carry out different processing according to next layer of type.If pre- in network model configuration
Next layer of definition is convolutional layer or full articulamentum, is incited somebody to action then output characteristic pattern skips pond layer by Data Post module
As a result it is medium to be processed to write back to external storage;It will be defeated if predefined next layer in network model configuration is pond layer
Enter characteristic pattern and is sent to progress pond processing in the computational submodule of pond.
Referring to Fig. 4, the activation primitive computational submodule in embodiment includes an activation primitive selection submodule and three
Function computational submodule, activation primitive select the selector in submodule to be determined by the configuration parameter in model configuration module, and three
A function computational submodule respectively corresponds the calculating of Sigmoid, tanh and ReLU activation primitive.The characteristic pattern of input is according to sharp
The path that function selection submodule living determines is sent to progress activation primitive calculation processing in function computational submodule, has handled
It is sent in data memory module or pond computational submodule at rear according to configuration parameter.
Referring to Fig. 5, the Ping-Pong for the use of two sizes being pool_size*W in the computational submodule of pond
Buffer saves the calculated result from activation primitive computational submodule, and wherein pool_size and W is configuration parameter, convolution
The calculated result of computational submodule is constantly filled into buffer1 first, can carry out the buffer in the filling process
In part pondization calculate, after buffer1 is filled full, the calculated result of convolutional calculation module is filled into buffer2
In, in the filling process of buffer2 can to the data in buffer2 carry out pondization calculate, while buffer1 with
Data between buffer2 can also carry out pondization calculating, when buffer2 is filled full, the calculating of convolutional calculation module
As a result it is filled into buffer1 again, two buffer are worked alternatively like this to be completed until entire pondization calculates.At two
It also include pond window between buffer, the data of the window come from two buffer, and a buffer is counted wherein
It calculates and operates and another buffer is filled the calculating that can be carried out the pond window among two buffer when operation.
Since the data between the window of pond do not calculate relevance, it is possible to loop unrolling method be used to make in different windows
Calculate synchronous carry out.
Referring to Fig. 6, during processing, the input matrix that N number of input vector is constituted laterally divides pond computational submodule
It is dim1/m sections, wherein N indicates the quantity of input feature value, and dim1 indicates that the dimension of input feature value, m indicate input
The section length of feature vector, each section be separately formed a size be m*N submatrix, submatrix respectively with weight matrix
In corresponding part be multiplied can be obtained size be n*N submatrix constitute partial results, dim1/m section partial results conjunction
And be exactly the calculated result of the output vector composition of final N, the corresponding part in calculated sub-matrix and weight matrix
When matrix multiplication operation, acceleration calculating is carried out using Winograd minimum filtering matrix multiplication.
Referring to Fig. 7, computational submodule is connected when the pond computational submodule in convolutional neural networks computing module or entirely
Processing after the completion of, Data Post module starts to calculate above-mentioned pondization or the data of full connection computational submodule output are write
It returns in data memory module, it should be in the process using the barrier operation in OpenCL frame to guarantee to obtain whole calculating knots
After fruit just start transmission and all data be transmitted after just start in next step handle.
It is the process flow for the above-mentioned acceleration system that embodiment provides referring to Fig. 8, mainly includes three parts;First
It is divided into kernel program compilation process, in order to maximally utilize the computing resource and storage resource on FPGA, it is suitable to need to be arranged
Network query function degree of parallelism parameter.In embodiment, the process of setting degree of parallelism parameter is automatically performed by program, is set first
Determine PARALLEL_FEATURE the and PARALLEL_KERNEL initial value in convolutional neural networks kernel program, then utilizes
Altera OpenCL SDK is compiled kernel program, obtains resource utilization from compiling report after the completion of compiling,
PARALLEL_ is updated if the utilization of resources does not reach maximum including storage resource, logical resource, computing resource etc.
The value of FEATURE and PARALLEL_KERNEL recompilates, until maximum hardware resource utilization is obtained, after the completion of compiling
To the hardware program that may operate on FPGA.
Second part is Parameter Configuration process, including network model calculating parameter and model configuration parameter, network model meter
It calculates parameter directly to read from the model file caffemodel of Caffe, model configuration parameter includes the input feature vector figure of each layer
The configuration of size, the size of convolution kernel, pond window size etc., parameter utilizes clSetKernelArg () letter in OpenCL
It counts up into, a following table 1 illustrates the type and parameter value of model configuration parameter by taking VGG16 as an example.
The type and parameter value example of 1 model configuration parameter of table
In upper table, in Activate func column, 0 indicates no activation primitive, and 1 indicates to use ReLU activation primitive, 2
It indicates to use Sigmoid activation primitive, 3 indicate to use Tanh activation primitive;In Output dst column, 1 indicates to be output to data
Memory module, 2 indicate to be output to pond computational submodule, and 3 indicate to be output to convolutional calculation submodule.
Part III be neural network operational process, when host by picture transfer into data memory module after
System on FPGA brings into operation, and calculated result is returned to host by data memory module after the completion of operation, inputs without picture
When terminate to run.
The convolutional neural networks acceleration system based on FPGA that embodiment provides realizes on DE5a-Net development board
VGG16 and AlexNet network model, and performance test has been carried out using the image data that size is 224*224*3, it is real
It tests statistics indicate that the processing speed of VGG16 is 160ms/image, the processing speed of AlexNet is 12ms/image, is better than it
His FPGA implementation.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all wrap
Containing within protection scope of the present invention.
Claims (10)
1. a kind of convolutional neural networks acceleration system based on FPGA, which is characterized in that including data preprocessing module, convolution mind
Through network query function module, Data Post module, data memory module and network model configuration module;The data prediction mould
Block, convolutional neural networks computing module and data post-processing module realize that data memory module is based on outside FPGA piece based on FPGA
Storage realizes that network model configuration module is stored based on the on piece of FPGA and realized;
The data preprocessing module is used to read corresponding volume from data memory module according to the calculation stages being presently in
Product nuclear parameter and input feature vector figure, and convolution nuclear parameter and input feature vector figure are pre-processed: convolution kernel model parameter is tieed up by 4
It at 3 dimensions, and desires to make money or profit to input feature vector and is unfolded with sliding window and is replicated, so that local feature figure and convolution kernel in sliding window
Parameter corresponds, and obtains the convolution kernel argument sequence convenient for directly calculating and local characteristic pattern series;It will after the completion of pretreatment
The convolution nuclear parameter and input feature vector figure handled well are sent to convolutional neural networks computing module;
The network model configuration module is used to carry out parameter configuration to convolutional neural networks computing module;The convolutional Neural net
Convolutional layer, activation primitive layer, pond layer and full articulamentum in convolutional neural networks is independently arranged by network computing module, passes through ginseng
Number configuration joins the convolution kernel received from data preprocessing module according to configuration parameter to construct a variety of different network structures
Number and input feature vector figure carry out convolution, intensify, the interlayer stream treatment in pond and full connection calculating, and processing result is sent to data
Post-processing module;
The Data Post module is used to the output data of convolutional neural networks computing module being written to data memory module
In;
The data memory module is used to store the model parameter caffemodel of convolutional neural networks, intermediate features figure calculates knot
Fruit and final calculation result, the module carry out data exchange by PCIe interface and external host.
2. convolutional neural networks acceleration system as described in claim 1, which is characterized in that the convolutional neural networks calculate mould
Block includes convolutional calculation submodule, activation primitive computational submodule, pond computational submodule and connects computational submodule entirely;Convolution
According to the predefined network model of network model configuration module between these submodules of neural computing inside modules
Configuration parameter connects;
The convolutional calculation submodule carries out convolutional calculation using the convolution nuclear parameter and characteristic pattern of input, after the completion sends out result
It is sent to activation primitive computational submodule;
The activation primitive computational submodule is selected according to the predefined activation primitive configuration parameter of network model parameter configuration module
Select activation primitive;Activation calculating is carried out to characteristic pattern using selected activation primitive, is after the completion sent out result according to parameter configuration
It is sent in pond computational submodule or full connection computational submodule;
The pond computational submodule is used to carry out pondization to received characteristic pattern to calculate, and pre- according to network model configuration module
Pond result is sent full connection computing module by the configuration parameter of definition, or is sent directly to Data Post module;
The full connection computational submodule is used to carry out received characteristic pattern full connection to calculate, and sends number for full connection result
According to post-processing module.
3. convolutional neural networks acceleration system as claimed in claim 2, which is characterized in that the convolutional neural networks calculate mould
Block is by convolutional calculation submodule, activation primitive computational submodule, pond computational submodule, full connection computational submodule according to network
Model configuration parameter cascades, and is carried out data transmission between these submodules using OpenCL Channel, in these submodules
Calculating execute parallel, the calculating between these submodules be flowing water carry out.
4. convolutional neural networks acceleration system as claimed in claim 2 or claim 3, which is characterized in that the convolutional calculation submodule
Including one or more matrix multiplication computational submodules;The quantity of matrix multiplication computational submodule is pre- by network model configuration module
The configuration parameter of definition is set;Processing between each matrix multiplication computational submodule executes parallel;
Matrix multiplication computational submodule accelerates operation using Winograd minimum filtering algorithm, obtains single convolution for calculating
Matrix multiplication between core and corresponding local feature figure.
5. convolutional neural networks acceleration system as claimed in claim 2 or claim 3, which is characterized in that the activation primitive calculates son
Module includes activation primitive selection submodule, Sigmoid function computational submodule, Tanh function computational submodule and ReLU function
Computational submodule;
Activation primitive selection submodule respectively with Sigmoid function computational submodule, Tanh function computational submodule and
ReLU function computational submodule is connected, and the data of characteristic pattern are sent to one in these three computational submodules;
The activation primitive selection submodule is used to set the activation calculation of characteristic pattern in convolutional neural networks;Sigmoid
Function computational submodule is used to carry out the calculating of Sigmoid function;Tanh function computational submodule is for carrying out Tanh function
It calculates;ReLU function computational submodule is used to carry out the calculating of ReLU function.
6. convolutional neural networks acceleration system as claimed in any one of claims 1 to 5, which is characterized in that the pondization calculates
Submodule includes the Double buffer being made of two FPGA on piece storages, for the temporal aspect figure number in storage pool calculating process
According to buffer size is set by the predefined network configuration parameters of network model parameter configuration module, the buffering of different pond layers
Area is in different size, and table tennis read-write operation is realized by this double-buffer area structure, realizes the stream treatment that pondization calculates.
7. convolutional neural networks acceleration system as claimed in claim 1 or 2, which is characterized in that the data preprocessing module
Submodule is pre-processed including data transmission module, convolution nuclear parameter pretreatment submodule and characteristic pattern;
The data transmission module is for controlling feature figure and convolution nuclear parameter in data memory module and convolutional neural networks
Transmission between computing module;Convolution nuclear parameter pretreatment submodule is for resetting convolution nuclear parameter, arrangement is handled;It is special
Sign figure pretreatment submodule for characteristic pattern is unfolded, replicate and arrangement processing.
8. convolutional neural networks acceleration system as claimed in claim 7, which is characterized in that the data transmission module includes
DDR controller, data transmission bus and memory buffers;
The DDR controller be used for control data among DDR and FPGA data transmission, data transmission bus connect DDR and
FPGA is the channel of data transmission;Reading of the memory buffers for temporal data, reduction FPGA to DDR, improve data transfer speed
Degree.
9. convolutional neural networks acceleration system as claimed in claim 1 or 2, which is characterized in that the data memory module packet
Convolution nuclear parameter sub-module stored, characteristic pattern sub-module stored are included, convolution nuclear parameter sub-module stored is for storing convolution kernel ginseng
Number, characteristic pattern sub-module stored are used to store the temporal aspect figure in input feature vector figure and calculating process;These sub-module storeds
It is preferred that being divided by the DDR memory being connect with FPGA.
10. convolutional neural networks acceleration system as claimed in claim 1 or 2, which is characterized in that the network model parameter is matched
Module is set for storing network model configuration parameter, in size, convolutional calculation submodule including network inputs characteristic pattern
The parameter rule of pond window size in the size and number of convolution nuclear parameter, pond computational submodule, full connection computational submodule
Mould calculates degree of parallelism;Data in network model parameter configuration module are preferably previously written before system starting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810710069.1A CN109086867B (en) | 2018-07-02 | 2018-07-02 | Convolutional neural network acceleration system based on FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810710069.1A CN109086867B (en) | 2018-07-02 | 2018-07-02 | Convolutional neural network acceleration system based on FPGA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109086867A true CN109086867A (en) | 2018-12-25 |
CN109086867B CN109086867B (en) | 2021-06-08 |
Family
ID=64836906
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810710069.1A Expired - Fee Related CN109086867B (en) | 2018-07-02 | 2018-07-02 | Convolutional neural network acceleration system based on FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109086867B (en) |
Cited By (64)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109656721A (en) * | 2018-12-28 | 2019-04-19 | 上海新储集成电路有限公司 | A kind of efficient intelligence system |
CN109685209A (en) * | 2018-12-29 | 2019-04-26 | 福州瑞芯微电子股份有限公司 | A kind of device and method for accelerating neural network computing speed |
CN109767002A (en) * | 2019-01-17 | 2019-05-17 | 济南浪潮高新科技投资发展有限公司 | A kind of neural network accelerated method based on muti-piece FPGA collaboration processing |
CN109784489A (en) * | 2019-01-16 | 2019-05-21 | 北京大学软件与微电子学院 | Convolutional neural networks IP kernel based on FPGA |
CN109799977A (en) * | 2019-01-25 | 2019-05-24 | 西安电子科技大学 | The method and system of instruction repertorie exploitation scheduling data |
CN109948784A (en) * | 2019-01-03 | 2019-06-28 | 重庆邮电大学 | A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm |
CN109961139A (en) * | 2019-01-08 | 2019-07-02 | 广东浪潮大数据研究有限公司 | A kind of accelerated method, device, equipment and the storage medium of residual error network |
CN109976903A (en) * | 2019-02-22 | 2019-07-05 | 华中科技大学 | A kind of deep learning Heterogeneous Computing method and system based on slice width Memory Allocation |
CN110188869A (en) * | 2019-05-05 | 2019-08-30 | 北京中科汇成科技有限公司 | A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating |
CN110263925A (en) * | 2019-06-04 | 2019-09-20 | 电子科技大学 | A kind of hardware-accelerated realization framework of the convolutional neural networks forward prediction based on FPGA |
CN110334801A (en) * | 2019-05-09 | 2019-10-15 | 苏州浪潮智能科技有限公司 | A kind of hardware-accelerated method, apparatus, equipment and the system of convolutional neural networks |
CN110390392A (en) * | 2019-08-01 | 2019-10-29 | 上海安路信息科技有限公司 | Deconvolution parameter accelerator, data read-write method based on FPGA |
CN110399883A (en) * | 2019-06-28 | 2019-11-01 | 苏州浪潮智能科技有限公司 | Image characteristic extracting method, device, equipment and computer readable storage medium |
CN110443357A (en) * | 2019-08-07 | 2019-11-12 | 上海燧原智能科技有限公司 | Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium |
CN110458279A (en) * | 2019-07-15 | 2019-11-15 | 武汉魅瞳科技有限公司 | A kind of binary neural network accelerated method and system based on FPGA |
CN110490311A (en) * | 2019-07-08 | 2019-11-22 | 华南理工大学 | Convolutional neural networks accelerator and its control method based on RISC-V framework |
CN110852930A (en) * | 2019-10-25 | 2020-02-28 | 华中科技大学 | FPGA graph processing acceleration method and system based on OpenCL |
CN111079923A (en) * | 2019-11-08 | 2020-04-28 | 中国科学院上海高等研究院 | Spark convolution neural network system suitable for edge computing platform and circuit thereof |
CN111105015A (en) * | 2019-12-06 | 2020-05-05 | 浪潮(北京)电子信息产业有限公司 | General CNN reasoning accelerator, control method thereof and readable storage medium |
CN111160544A (en) * | 2019-12-31 | 2020-05-15 | 上海安路信息科技有限公司 | Data activation method and FPGA data activation system |
CN111210019A (en) * | 2020-01-16 | 2020-05-29 | 电子科技大学 | Neural network inference method based on software and hardware cooperative acceleration |
CN111242289A (en) * | 2020-01-19 | 2020-06-05 | 清华大学 | Convolutional neural network acceleration system and method with expandable scale |
CN111325327A (en) * | 2020-03-06 | 2020-06-23 | 四川九洲电器集团有限责任公司 | Universal convolution neural network operation architecture based on embedded platform and use method |
CN111340198A (en) * | 2020-03-26 | 2020-06-26 | 上海大学 | Neural network accelerator with highly-multiplexed data based on FPGA (field programmable Gate array) |
CN111626403A (en) * | 2020-05-14 | 2020-09-04 | 北京航空航天大学 | Convolutional neural network accelerator based on CPU-FPGA memory sharing |
CN111736986A (en) * | 2020-05-29 | 2020-10-02 | 浪潮(北京)电子信息产业有限公司 | FPGA (field programmable Gate array) accelerated execution method of deep learning model and related device |
CN111860781A (en) * | 2020-07-10 | 2020-10-30 | 逢亿科技(上海)有限公司 | Convolutional neural network feature decoding system realized based on FPGA |
CN111931913A (en) * | 2020-08-10 | 2020-11-13 | 西安电子科技大学 | Caffe-based deployment method of convolutional neural network on FPGA |
CN112101284A (en) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | Image recognition method, training method, device and system of image recognition model |
CN112149814A (en) * | 2020-09-23 | 2020-12-29 | 哈尔滨理工大学 | Convolutional neural network acceleration system based on FPGA |
CN112732638A (en) * | 2021-01-22 | 2021-04-30 | 上海交通大学 | Heterogeneous acceleration system and method based on CTPN network |
CN112766478A (en) * | 2021-01-21 | 2021-05-07 | 中国电子科技集团公司信息科学研究院 | FPGA pipeline structure for convolutional neural network |
CN112819140A (en) * | 2021-02-02 | 2021-05-18 | 电子科技大学 | OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method |
CN112905526A (en) * | 2021-01-21 | 2021-06-04 | 北京理工大学 | FPGA implementation method for various types of convolution |
CN112949845A (en) * | 2021-03-08 | 2021-06-11 | 内蒙古大学 | Deep convolutional neural network accelerator based on FPGA |
CN113065647A (en) * | 2021-03-30 | 2021-07-02 | 西安电子科技大学 | Computing-storage communication system and communication method for accelerating neural network |
CN113467783A (en) * | 2021-07-19 | 2021-10-01 | 中科曙光国际信息产业有限公司 | Kernel function compiling method and device of artificial intelligent accelerator |
CN113517007A (en) * | 2021-04-29 | 2021-10-19 | 西安交通大学 | Flow processing method and system and memristor array |
WO2021259105A1 (en) * | 2020-06-22 | 2021-12-30 | 深圳鲲云信息科技有限公司 | Neural network accelerator |
CN113949592A (en) * | 2021-12-22 | 2022-01-18 | 湖南大学 | Anti-attack defense system and method based on FPGA |
US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
CN114943635A (en) * | 2021-09-30 | 2022-08-26 | 太初(无锡)电子科技有限公司 | Fusion operator design and implementation method based on heterogeneous collaborative computing core |
CN114997392A (en) * | 2022-08-03 | 2022-09-02 | 成都图影视讯科技有限公司 | Architecture and architectural methods for neural network computing |
US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US11580386B2 (en) | 2019-03-18 | 2023-02-14 | Electronics And Telecommunications Research Institute | Convolutional layer acceleration unit, embedded system having the same, and method for operating the embedded system |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
CN117195989A (en) * | 2023-11-06 | 2023-12-08 | 深圳市九天睿芯科技有限公司 | Vector processor, neural network accelerator, chip and electronic equipment |
US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
EP4156079A4 (en) * | 2020-05-22 | 2024-03-27 | Inspur Electronic Information Industry Co., Ltd | Image data storage method, image data processing method and system, and related apparatus |
US12014553B2 (en) | 2019-02-01 | 2024-06-18 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341127A (en) * | 2017-07-05 | 2017-11-10 | 西安电子科技大学 | Convolutional neural networks accelerated method based on OpenCL standards |
CN107657581A (en) * | 2017-09-28 | 2018-02-02 | 中国人民解放军国防科技大学 | Convolutional neural network CNN hardware accelerator and acceleration method |
US20180046900A1 (en) * | 2016-08-11 | 2018-02-15 | Nvidia Corporation | Sparse convolutional neural network accelerator |
CN108154229A (en) * | 2018-01-10 | 2018-06-12 | 西安电子科技大学 | Accelerate the image processing method of convolutional neural networks frame based on FPGA |
-
2018
- 2018-07-02 CN CN201810710069.1A patent/CN109086867B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180046900A1 (en) * | 2016-08-11 | 2018-02-15 | Nvidia Corporation | Sparse convolutional neural network accelerator |
CN107341127A (en) * | 2017-07-05 | 2017-11-10 | 西安电子科技大学 | Convolutional neural networks accelerated method based on OpenCL standards |
CN107657581A (en) * | 2017-09-28 | 2018-02-02 | 中国人民解放军国防科技大学 | Convolutional neural network CNN hardware accelerator and acceleration method |
CN108154229A (en) * | 2018-01-10 | 2018-06-12 | 西安电子科技大学 | Accelerate the image processing method of convolutional neural networks frame based on FPGA |
Cited By (95)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
US12020476B2 (en) | 2017-03-23 | 2024-06-25 | Tesla, Inc. | Data synthesis for autonomous control systems |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US12086097B2 (en) | 2017-07-24 | 2024-09-10 | Tesla, Inc. | Vector computational unit |
US11797304B2 (en) | 2018-02-01 | 2023-10-24 | Tesla, Inc. | Instruction set architecture for a vector computational unit |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US12079723B2 (en) | 2018-07-26 | 2024-09-03 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11983630B2 (en) | 2018-09-03 | 2024-05-14 | Tesla, Inc. | Neural networks for embedded devices |
US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
US11908171B2 (en) | 2018-12-04 | 2024-02-20 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
CN109656721A (en) * | 2018-12-28 | 2019-04-19 | 上海新储集成电路有限公司 | A kind of efficient intelligence system |
CN109685209A (en) * | 2018-12-29 | 2019-04-26 | 福州瑞芯微电子股份有限公司 | A kind of device and method for accelerating neural network computing speed |
CN109685209B (en) * | 2018-12-29 | 2020-11-06 | 瑞芯微电子股份有限公司 | Device and method for accelerating operation speed of neural network |
CN109948784A (en) * | 2019-01-03 | 2019-06-28 | 重庆邮电大学 | A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm |
CN109961139A (en) * | 2019-01-08 | 2019-07-02 | 广东浪潮大数据研究有限公司 | A kind of accelerated method, device, equipment and the storage medium of residual error network |
CN109784489A (en) * | 2019-01-16 | 2019-05-21 | 北京大学软件与微电子学院 | Convolutional neural networks IP kernel based on FPGA |
CN109784489B (en) * | 2019-01-16 | 2021-07-30 | 北京大学软件与微电子学院 | Convolutional neural network IP core based on FPGA |
CN109767002A (en) * | 2019-01-17 | 2019-05-17 | 济南浪潮高新科技投资发展有限公司 | A kind of neural network accelerated method based on muti-piece FPGA collaboration processing |
CN109799977B (en) * | 2019-01-25 | 2021-07-27 | 西安电子科技大学 | Method and system for developing and scheduling data by instruction program |
CN109799977A (en) * | 2019-01-25 | 2019-05-24 | 西安电子科技大学 | The method and system of instruction repertorie exploitation scheduling data |
US12014553B2 (en) | 2019-02-01 | 2024-06-18 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
CN109976903A (en) * | 2019-02-22 | 2019-07-05 | 华中科技大学 | A kind of deep learning Heterogeneous Computing method and system based on slice width Memory Allocation |
US11568268B2 (en) | 2019-02-22 | 2023-01-31 | Huazhong University Of Science And Technology | Deep learning heterogeneous computing method based on layer-wide memory allocation and system thereof |
US11580386B2 (en) | 2019-03-18 | 2023-02-14 | Electronics And Telecommunications Research Institute | Convolutional layer acceleration unit, embedded system having the same, and method for operating the embedded system |
CN110188869B (en) * | 2019-05-05 | 2021-08-10 | 北京中科汇成科技有限公司 | Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm |
CN110188869A (en) * | 2019-05-05 | 2019-08-30 | 北京中科汇成科技有限公司 | A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating |
CN110334801A (en) * | 2019-05-09 | 2019-10-15 | 苏州浪潮智能科技有限公司 | A kind of hardware-accelerated method, apparatus, equipment and the system of convolutional neural networks |
CN110263925A (en) * | 2019-06-04 | 2019-09-20 | 电子科技大学 | A kind of hardware-accelerated realization framework of the convolutional neural networks forward prediction based on FPGA |
CN110263925B (en) * | 2019-06-04 | 2022-03-15 | 电子科技大学 | Hardware acceleration implementation device for convolutional neural network forward prediction based on FPGA |
CN110399883A (en) * | 2019-06-28 | 2019-11-01 | 苏州浪潮智能科技有限公司 | Image characteristic extracting method, device, equipment and computer readable storage medium |
CN110490311A (en) * | 2019-07-08 | 2019-11-22 | 华南理工大学 | Convolutional neural networks accelerator and its control method based on RISC-V framework |
CN110458279A (en) * | 2019-07-15 | 2019-11-15 | 武汉魅瞳科技有限公司 | A kind of binary neural network accelerated method and system based on FPGA |
CN110458279B (en) * | 2019-07-15 | 2022-05-20 | 武汉魅瞳科技有限公司 | FPGA-based binary neural network acceleration method and system |
CN110390392A (en) * | 2019-08-01 | 2019-10-29 | 上海安路信息科技有限公司 | Deconvolution parameter accelerator, data read-write method based on FPGA |
CN110443357B (en) * | 2019-08-07 | 2020-09-15 | 上海燧原智能科技有限公司 | Convolutional neural network calculation optimization method and device, computer equipment and medium |
CN110443357A (en) * | 2019-08-07 | 2019-11-12 | 上海燧原智能科技有限公司 | Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium |
CN110852930A (en) * | 2019-10-25 | 2020-02-28 | 华中科技大学 | FPGA graph processing acceleration method and system based on OpenCL |
CN111079923A (en) * | 2019-11-08 | 2020-04-28 | 中国科学院上海高等研究院 | Spark convolution neural network system suitable for edge computing platform and circuit thereof |
CN111079923B (en) * | 2019-11-08 | 2023-10-13 | 中国科学院上海高等研究院 | Spark convolutional neural network system suitable for edge computing platform and circuit thereof |
CN111105015A (en) * | 2019-12-06 | 2020-05-05 | 浪潮(北京)电子信息产业有限公司 | General CNN reasoning accelerator, control method thereof and readable storage medium |
CN111160544A (en) * | 2019-12-31 | 2020-05-15 | 上海安路信息科技有限公司 | Data activation method and FPGA data activation system |
CN111160544B (en) * | 2019-12-31 | 2021-04-23 | 上海安路信息科技股份有限公司 | Data activation method and FPGA data activation system |
CN111210019B (en) * | 2020-01-16 | 2022-06-24 | 电子科技大学 | Neural network inference method based on software and hardware cooperative acceleration |
CN111210019A (en) * | 2020-01-16 | 2020-05-29 | 电子科技大学 | Neural network inference method based on software and hardware cooperative acceleration |
CN111242289A (en) * | 2020-01-19 | 2020-06-05 | 清华大学 | Convolutional neural network acceleration system and method with expandable scale |
CN111325327A (en) * | 2020-03-06 | 2020-06-23 | 四川九洲电器集团有限责任公司 | Universal convolution neural network operation architecture based on embedded platform and use method |
CN111340198A (en) * | 2020-03-26 | 2020-06-26 | 上海大学 | Neural network accelerator with highly-multiplexed data based on FPGA (field programmable Gate array) |
CN111340198B (en) * | 2020-03-26 | 2023-05-05 | 上海大学 | Neural network accelerator for data high multiplexing based on FPGA |
CN111626403A (en) * | 2020-05-14 | 2020-09-04 | 北京航空航天大学 | Convolutional neural network accelerator based on CPU-FPGA memory sharing |
EP4156079A4 (en) * | 2020-05-22 | 2024-03-27 | Inspur Electronic Information Industry Co., Ltd | Image data storage method, image data processing method and system, and related apparatus |
CN111736986A (en) * | 2020-05-29 | 2020-10-02 | 浪潮(北京)电子信息产业有限公司 | FPGA (field programmable Gate array) accelerated execution method of deep learning model and related device |
CN111736986B (en) * | 2020-05-29 | 2023-06-23 | 浪潮(北京)电子信息产业有限公司 | FPGA (field programmable Gate array) acceleration execution method and related device of deep learning model |
WO2021259105A1 (en) * | 2020-06-22 | 2021-12-30 | 深圳鲲云信息科技有限公司 | Neural network accelerator |
CN111860781B (en) * | 2020-07-10 | 2024-06-28 | 逢亿科技(上海)有限公司 | Convolutional neural network feature decoding system based on FPGA |
CN111860781A (en) * | 2020-07-10 | 2020-10-30 | 逢亿科技(上海)有限公司 | Convolutional neural network feature decoding system realized based on FPGA |
CN111931913B (en) * | 2020-08-10 | 2023-08-01 | 西安电子科技大学 | Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe |
CN111931913A (en) * | 2020-08-10 | 2020-11-13 | 西安电子科技大学 | Caffe-based deployment method of convolutional neural network on FPGA |
CN112149814A (en) * | 2020-09-23 | 2020-12-29 | 哈尔滨理工大学 | Convolutional neural network acceleration system based on FPGA |
CN112101284A (en) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | Image recognition method, training method, device and system of image recognition model |
CN112905526A (en) * | 2021-01-21 | 2021-06-04 | 北京理工大学 | FPGA implementation method for various types of convolution |
CN112766478B (en) * | 2021-01-21 | 2024-04-12 | 中国电子科技集团公司信息科学研究院 | FPGA (field programmable Gate array) pipeline structure oriented to convolutional neural network |
CN112766478A (en) * | 2021-01-21 | 2021-05-07 | 中国电子科技集团公司信息科学研究院 | FPGA pipeline structure for convolutional neural network |
CN112732638A (en) * | 2021-01-22 | 2021-04-30 | 上海交通大学 | Heterogeneous acceleration system and method based on CTPN network |
CN112732638B (en) * | 2021-01-22 | 2022-05-06 | 上海交通大学 | Heterogeneous acceleration system and method based on CTPN network |
CN112819140B (en) * | 2021-02-02 | 2022-06-24 | 电子科技大学 | OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method |
CN112819140A (en) * | 2021-02-02 | 2021-05-18 | 电子科技大学 | OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method |
CN112949845A (en) * | 2021-03-08 | 2021-06-11 | 内蒙古大学 | Deep convolutional neural network accelerator based on FPGA |
CN113065647A (en) * | 2021-03-30 | 2021-07-02 | 西安电子科技大学 | Computing-storage communication system and communication method for accelerating neural network |
CN113065647B (en) * | 2021-03-30 | 2023-04-25 | 西安电子科技大学 | Calculation-storage communication system and communication method for accelerating neural network |
CN113517007B (en) * | 2021-04-29 | 2023-07-25 | 西安交通大学 | Flowing water processing method and system and memristor array |
CN113517007A (en) * | 2021-04-29 | 2021-10-19 | 西安交通大学 | Flow processing method and system and memristor array |
CN113467783A (en) * | 2021-07-19 | 2021-10-01 | 中科曙光国际信息产业有限公司 | Kernel function compiling method and device of artificial intelligent accelerator |
CN113467783B (en) * | 2021-07-19 | 2023-09-12 | 中科曙光国际信息产业有限公司 | Nuclear function compiling method and device of artificial intelligent accelerator |
CN114943635B (en) * | 2021-09-30 | 2023-08-22 | 太初(无锡)电子科技有限公司 | Fusion operator design and implementation method based on heterogeneous collaborative computing core |
CN114943635A (en) * | 2021-09-30 | 2022-08-26 | 太初(无锡)电子科技有限公司 | Fusion operator design and implementation method based on heterogeneous collaborative computing core |
CN113949592A (en) * | 2021-12-22 | 2022-01-18 | 湖南大学 | Anti-attack defense system and method based on FPGA |
CN113949592B (en) * | 2021-12-22 | 2022-03-22 | 湖南大学 | Anti-attack defense system and method based on FPGA |
CN114997392B (en) * | 2022-08-03 | 2022-10-21 | 成都图影视讯科技有限公司 | Architecture and architectural methods for neural network computing |
CN114997392A (en) * | 2022-08-03 | 2022-09-02 | 成都图影视讯科技有限公司 | Architecture and architectural methods for neural network computing |
CN117195989A (en) * | 2023-11-06 | 2023-12-08 | 深圳市九天睿芯科技有限公司 | Vector processor, neural network accelerator, chip and electronic equipment |
CN117195989B (en) * | 2023-11-06 | 2024-06-04 | 深圳市九天睿芯科技有限公司 | Vector processor, neural network accelerator, chip and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109086867B (en) | 2021-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086867A (en) | A kind of convolutional neural networks acceleration system based on FPGA | |
CN108241890B (en) | Reconfigurable neural network acceleration method and architecture | |
US20220012593A1 (en) | Neural network accelerator and neural network acceleration method based on structured pruning and low-bit quantization | |
CN110058883B (en) | CNN acceleration method and system based on OPU | |
CN107578095B (en) | Neural computing device and processor comprising the computing device | |
CN111967468B (en) | Implementation method of lightweight target detection neural network based on FPGA | |
CN108537331A (en) | A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic | |
CN107657581A (en) | Convolutional neural network CNN hardware accelerator and acceleration method | |
CN109784489A (en) | Convolutional neural networks IP kernel based on FPGA | |
CN110458279A (en) | A kind of binary neural network accelerated method and system based on FPGA | |
CN109472356A (en) | A kind of accelerator and method of restructural neural network algorithm | |
CN107066239A (en) | A kind of hardware configuration for realizing convolutional neural networks forward calculation | |
CN107239824A (en) | Apparatus and method for realizing sparse convolution neutral net accelerator | |
CN108647773A (en) | A kind of hardwired interconnections framework of restructural convolutional neural networks | |
CN110674927A (en) | Data recombination method for pulse array structure | |
CN109446996B (en) | Face recognition data processing device and method based on FPGA | |
CN113743599B (en) | Computing device and server of convolutional neural network | |
Que et al. | Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs | |
CN113792621A (en) | Target detection accelerator design method based on FPGA | |
Huang et al. | A high performance multi-bit-width booth vector systolic accelerator for NAS optimized deep learning neural networks | |
CN109739556A (en) | A kind of general deep learning processor that interaction is cached based on multiple parallel and is calculated | |
Duan et al. | Energy-efficient architecture for FPGA-based deep convolutional neural networks with binary weights | |
CN115238863A (en) | Hardware acceleration method, system and application of convolutional neural network convolutional layer | |
CN110490308A (en) | Accelerate design method, terminal device and the storage medium in library | |
CN113157638B (en) | Low-power-consumption in-memory calculation processor and processing operation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210608 |