CN109086867A - A kind of convolutional neural networks acceleration system based on FPGA - Google Patents

A kind of convolutional neural networks acceleration system based on FPGA Download PDF

Info

Publication number
CN109086867A
CN109086867A CN201810710069.1A CN201810710069A CN109086867A CN 109086867 A CN109086867 A CN 109086867A CN 201810710069 A CN201810710069 A CN 201810710069A CN 109086867 A CN109086867 A CN 109086867A
Authority
CN
China
Prior art keywords
module
convolutional neural
neural networks
submodule
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810710069.1A
Other languages
Chinese (zh)
Other versions
CN109086867B (en
Inventor
李开
邹复好
孙浩
李全
祁迪
贺坤坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Charm Pupil Technology Co Ltd
Original Assignee
Wuhan Charm Pupil Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Charm Pupil Technology Co Ltd filed Critical Wuhan Charm Pupil Technology Co Ltd
Priority to CN201810710069.1A priority Critical patent/CN109086867B/en
Publication of CN109086867A publication Critical patent/CN109086867A/en
Application granted granted Critical
Publication of CN109086867B publication Critical patent/CN109086867B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The convolutional neural networks acceleration system based on FPGA that the invention discloses a kind of, the convolutional neural networks on FPGA are accelerated based on OpenCL programming framework, which includes data preprocessing module, Data Post module, convolutional neural networks computing module, data memory module and network model configuration module;Wherein convolutional neural networks computing module includes convolutional calculation submodule, activation primitive computational submodule, pond computational submodule and connects computational submodule entirely;The acceleration system provided by the invention can be arranged in use according to the hardware resource situation of FPGA calculates degree of parallelism to be adapted to different FPGA and different convolutional neural networks, efficient parallel streamlined mode convolutional neural networks can be run on FPGA, and system power dissipation can be effectively reduced and greatly improve the processing speed of convolutional neural networks, meet requirement of real-time.

Description

A kind of convolutional neural networks acceleration system based on FPGA
Technical field
The invention belongs to neural computing technical fields, and in particular to a kind of convolutional neural networks based on FPGA add Speed system.
Background technique
With the continuous maturation of depth learning technology, convolutional neural networks are widely used in computer vision, voice is known Not, the fields such as natural language processing, and good effect is achieved in the practical application scenes such as Face datection, speech recognition Fruit.In recent years, due to scale go from strength to strength can training dataset and the neural network structure constantly brought forth new ideas, convolutional Neural net The accuracy of network and performance have all obtained significant raising, but as convolutional neural networks network structure becomes to become increasingly complex, Requirement in practical application scene to high real-time, low cost is higher and higher, the calculating of the hardware for running neural network The requirement of ability and energy consumption is also higher and higher.
FPGA has the characteristics that computing resource is abundant, flexibility is higher and energy efficiency is high, and and conventional digital circuits System is compared, and is had many advantages, such as programmable, high integration, high speed and high reliability, has constantly been attempted for accelerans net Network.OpenCL is the Heterogeneous Computing language based on traditional C language, may operate at the OverDrive Processor ODPs such as CPU, GPU, PFGA and DSP On, language abstraction hierarchy with higher, programmer, which need not understand hardware circuit and low-level details, can develop high-performance Application program, greatly reduce the complexity of programming process.
In November, 2012, the powerful parallel architecture and Open CL that altera corp is formally proposed collection FPGA are simultaneously Row programming model is familiar with C language for carrying out the software development kit (SDK) of Open CL exploitation on FPGA in one Programmer cracking can adapt to and rest under Open CL high-level language environment using the software development kit realize high property Energy, low-power consumption, high effect exploitation FPGA application method.Convolution is accelerated on FPGA using Altera OpenCL SDK The calculating of neural network, external accelerator of the FPGA as host can be realized the association of host Yu outside FPGA accelerator With work.
Summary of the invention
Extremely a little less in aiming at the above defects or improvement requirements of the prior art, the present invention provides one kind to be based on The convolutional neural networks acceleration system of FPGA, its object is to calculate structure to existing convolutional neural networks to be adjusted again It is whole pipelining between the concurrency and each computation layer in calculating process sufficiently to excavate convolutional neural networks, it improves The processing speed of convolutional neural networks.
To achieve the above object, according to one aspect of the present invention, a kind of convolutional neural networks based on FPGA are provided Acceleration system, including data preprocessing module, convolutional neural networks computing module, Data Post module, data memory module With network model configuration module;Wherein, data preprocessing module, convolutional neural networks computing module and data post-processing module It is realized based on FPGA, data memory module is realized based on the piece external storage of FPGA, piece of the network model configuration module based on FPGA Upper storage is realized;
Wherein, data preprocessing module is used to read phase from data memory module according to the calculation stages being presently in The convolution nuclear parameter and input feature vector figure answered, and convolution nuclear parameter and input feature vector figure are pre-processed: convolution kernel is tieed up by 4 Model parameter desires to make money or profit to input feature vector and is unfolded with sliding window and is replicated at 3 dimensions, so that the local feature in sliding window Figure is corresponded with convolution nuclear parameter, obtains the convolution kernel argument sequence convenient for directly calculating and local characteristic pattern series;Pre- place Convolutional neural networks computing module is sent by the convolution nuclear parameter handled well and input feature vector figure after the completion of reason;
Network model configuration module is used to carry out parameter configuration to convolutional neural networks computing module;The convolutional Neural Convolutional layer, activation primitive layer, pond layer and full articulamentum in convolutional neural networks is independently arranged by network query function module, is led to Parameter configuration is crossed to construct a variety of different network structures, and according to configuration parameter to the volume received from data preprocessing module Product nuclear parameter and input feature vector figure carry out convolution, intensify, the interlayer stream treatment in pond and full connection calculating, then for simultaneously in layer Row processing;Processing result is sent to Data Post module;
Data Post module is used to the output data of convolutional neural networks computing module being written to data memory module In;
Data memory module is used to store the model parameter caffemodel of convolutional neural networks, intermediate features figure calculates As a result and final calculation result, data memory module module carry out data exchange by PCIe interface and external host.
Preferably, above-mentioned convolutional neural networks acceleration system, convolutional neural networks computing module include convolutional calculation Submodule, activation primitive computational submodule, pond computational submodule and computational submodule is connected entirely, convolutional neural networks calculate It is connected between these submodules of inside modules according to the predefined network model configuration parameter of network model configuration module;
After convolutional neural networks computing module receives convolution nuclear parameter and the characteristic pattern of data preprocessing module transmission, Start to be handled according to each submodule that configuration parameter is organized, after sending the result to data after the completion of processing Manage module;
Specifically, convolutional calculation submodule carries out convolutional calculation using the convolution nuclear parameter and characteristic pattern of input, by result It is sent to activation primitive computational submodule;
Activation primitive computational submodule is selected according to the predefined activation primitive configuration parameter of network model parameter configuration module Activation primitive is selected, activation calculating is carried out to characteristic pattern using selected activation primitive, is after the completion sent out result according to parameter configuration It is sent in pond computational submodule or full connection computational submodule;
Pond computational submodule is used to carry out pondization to received characteristic pattern to calculate, and according to network model configuration module Pond result is sent full connection computing module by predefined configuration parameter, or is sent directly to Data Post module;
Full connection computational submodule is used to carry out received characteristic pattern full connection to calculate, and sends full connection result to Data Post module.
Preferably, above-mentioned convolutional neural networks acceleration system, data preprocessing module include data transmission mould Block, convolution nuclear parameter pretreatment submodule and characteristic pattern pre-process submodule;
Wherein, data transmission module is for controlling feature figure and convolution nuclear parameter in data memory module and convolution mind Through the transmission between network query function module;Convolution nuclear parameter pretreatment submodule is for resetting convolution nuclear parameter, being arranged Processing;Characteristic pattern pretreatment submodule for characteristic pattern is unfolded, replicate and arrangement processing.
Preferably, above-mentioned convolutional neural networks acceleration system, data memory module include convolution nuclear parameter storage Module, characteristic pattern sub-module stored, convolution nuclear parameter sub-module stored store submodule for storing convolution nuclear parameter, characteristic pattern Block is used to store the temporal aspect figure in input feature vector figure and calculating process;These sub-module storeds are preferably connected by with FPGA The DDR memory connect divides, in OpenCL programming framework, data memory module by as global memory come using.
Preferably, above-mentioned convolutional neural networks acceleration system, data transmission module include DDR controller, data Transfer bus and memory buffers;
Wherein, DDR controller is used to control data transmission of the data among DDR and FPGA, data transmission bus connection DDR and FPGA is the channel of data transmission;Reading of the memory buffers for temporal data, reduction FPGA to DDR, improves data Transmission speed.
Preferably, above-mentioned convolutional neural networks acceleration system, convolutional calculation submodule include one or more matrixes Multiplication computational submodule;The quantity of matrix multiplication computational submodule is set by the predefined configuration parameter of network model configuration module It is fixed;Calculating between each matrix multiplication computational submodule executes parallel;
Matrix multiplication computational submodule accelerates operation using Winograd minimum filtering algorithm, obtains list for calculating Matrix multiplication between a convolution kernel and corresponding local feature figure.
Preferably, above-mentioned convolutional neural networks acceleration system, activation primitive computational submodule include activation primitive choosing Select submodule, Sigmoid function computational submodule, Tanh function computational submodule and ReLU function computational submodule;
Activation primitive select submodule respectively with Sigmoid function computational submodule, Tanh function computational submodule and ReLU function computational submodule is connected, and the data of characteristic pattern are sent to one in these three computational submodules;
Wherein, activation primitive selection submodule is used to set the activation calculation of characteristic pattern in convolutional neural networks;
Sigmoid function computational submodule is used to carry out the calculating of Sigmoid function;Tanh function computational submodule is used In the calculating for carrying out Tanh function;ReLU function computational submodule is used to carry out the calculating of ReLU function.
Preferably, above-mentioned convolutional neural networks acceleration system, pond computational submodule include by two FPGA on pieces Store the Double buffer constituted;
For the temporal aspect diagram data in storage pool calculating process, buffer size is by network model parameter configuration The predefined network configuration parameters setting of module, the buffer size of different pond layers is different, is tied by this double-buffer area Structure realizes table tennis read-write operation, realizes the stream treatment of pondization calculating.
Preferably, above-mentioned convolutional neural networks acceleration system, network model parameter configuration module are deposited by FPGA on piece Storage is realized, for storing network model configuration parameter, size, convolutional calculation submodule including network inputs characteristic pattern The parameter of pond window size in the size and number of middle convolution nuclear parameter, pond computational submodule, full connection computational submodule Scale calculates degree of parallelism;Data in network model parameter configuration module are preferably previously written before system starting.
Preferably, above-mentioned convolutional neural networks acceleration system, convolutional neural networks computing module is by convolutional calculation Module, activation primitive computational submodule, pond computational submodule, full connection computational submodule are according to network model configuration parameter It cascades, is carried out data transmission between these submodules using OpenCL Channel, the calculating in these submodules is parallel It executes, the calculating between these submodules is that flowing water carries out.
The above-mentioned convolutional neural networks acceleration system based on FPGA provided by the invention, in conjunction with convolutional neural networks model The advantage of design feature and fpga chip feature and OpenCL programming framework, to existing convolutional neural networks calculate structure into Row is readjusted and designs corresponding module, sufficiently excavate concurrency of the convolutional neural networks in calculating process and It is pipelining between each computation layer, it is allowed to more be matched with the design feature of FPGA, rationally efficiently utilizes the calculating of FPGA design Resource improves the processing speed of convolutional neural networks.In general, contemplated above technical scheme and existing through the invention There is technology to compare, can achieve the following beneficial effects:
(1) the convolutional neural networks acceleration system provided by the invention based on FPGA, utilizes each layer of convolutional neural networks Estimated performance is devised suitable for a kind of pipeline processes, the system architecture of parallel computation;By data preprocessing module, convolution mind A pipeline organization is formed through network query function module and data post-processing module;After data preprocessing module and data Processing module controls the data between memory module and computing module and transmits, and convolution nuclear parameter and characteristic pattern pass sequentially through flowing water Three big module in cable architecture completes reading data, data calculate and the fluvial processes of data storage;And by convolutional Neural net Convolutional layer, activation primitive layer, pond layer and full articulamentum in network are respectively designed to individual computing module, are matched by parameter It sets to construct a variety of different network structures;It is many small to split into the processing of each submodule of convolutional neural networks Treatment process, the data of the corresponding submodule of each layer can all undergo reading data, data processing, data storage etc. different Stage forms the pipeline organization for being similar to computer instruction assembly line form;Allow calculating in neural net layer simultaneously Row executes, the calculating of interlayer can be executed with flowing water, can effectively improve the processing speed of convolutional neural networks.
(2) the convolutional neural networks acceleration system provided by the invention based on FPGA, based in convolutional neural networks calculating Convolution nuclear parameter, the low associate feature of data between local feature figure, in the parallel computation structure of convolutional calculation submodule In, there are the data of convolution kernel window corresponding with input feature vector figure to be calculated when calculating every time, under this framework, due to volume Calculating data between product core are not associated with, and can carry out multiple calculation processings parallel;And in traditional convolutional calculation process In, the data of input feature vector figure are obtained by way of sliding window, and it is corresponding that sliding window slides acquisition on characteristic pattern Numerical value in convolution window removes sliding window and in parallel computation structure of the invention directly by original sliding window Data drawout in mouthful forms multiple data blocks, and when calculating directly inputs corresponding data block, and this mode is by multiple numbers It is calculated simultaneously with convolution kernel according to block, further improves processing speed.
(3) the convolutional neural networks acceleration system based on FPGA provided through the invention is, it can be achieved that in convolutional Neural net If data enter pond computational submodule in the calculating process of network, the calculating of part pondization can be carried out;Due to multiple convolution The calculating of core is parallel, it is possible to while generating the partial results on passage portion, that is to say, that pond computational submodule Part input produced;The calculating of pond computational submodule is all with sliding window as convolutional calculation submodule It is calculated for unit, therefore can be started after the data in some window in the computational submodule of pond all obtain Pond, which is operable without etc. after the completion of all calculating of convolutional calculations submodule, to be started pondization again and operates;Convolutional calculation submodule Data on multiple channels can be generated simultaneously, and channel is not associated with interchannel when pondization calculating, so pondization calculates son Calculating in module on each channel can carry out parallel, and the processing speed of convolutional neural networks is greatly improved.
(4) the convolutional neural networks acceleration system provided by the invention based on FPGA, network model parameter is configurable, makes Degree of parallelism in the structure and network query function of network model is set with configuration file so that different types of network model and The FPGA of different computing capabilitys can run convolutional neural networks by parameter configuration.
(5) the convolutional neural networks acceleration system provided by the invention based on FPGA, meter of the preferred embodiment in convolutional layer Winograd minimum filtering algorithm is used during calculating, and can play the role of accelerating convolutional calculation;
Ping-pong buffer is used in the calculating process of pond layer, can be played and be accelerated pondization to calculate and reduce memory space The effect used;
The method that batch calculates is used in the calculating of full articulamentum, can reach to reduce and outside is deposited in calculating process The purpose of the access in space is stored up, and uses segmentation and calculates, the matrix multiplication operation of simplified higher-dimension can be played the role of, mentioned Processing speed is risen, the requirement to FPGA hardware operational capability is reduced;
Each computing module in convolutional neural networks is realized using OpenCL kernel program, can reduce development difficulty.
Detailed description of the invention
Fig. 1 is that the framework of one embodiment of the convolutional neural networks acceleration system provided by the invention based on FPGA shows It is intended to;
Fig. 2 is the processing schematic of the data preprocessing module in embodiment;
Fig. 3 is the processing schematic of the convolutional calculation submodule in embodiment;
Fig. 4 is the processing schematic of the activation primitive computational submodule in embodiment;
Fig. 5 is the processing schematic of the pond computational submodule in embodiment;
Fig. 6 is the processing schematic of the full connection computational submodule in embodiment;
Fig. 7 is the processing schematic of the Data Post module in embodiment;
Fig. 8 is the process flow diagram of the acceleration system in embodiment.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, The present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only used to explain this hair It is bright, it is not intended to limit the present invention.In addition, technology involved in the various embodiments of the present invention described below is special Sign can be combined with each other as long as they do not conflict with each other.
Referring to Fig.1, one embodiment of the convolutional neural networks acceleration system provided by the invention based on FPGA includes number Data preprocess module, convolutional neural networks computing module, Data Post module, data memory module and network model configuration Module;
Wherein, the input terminal of data preprocessing module is connected with data memory module, convolutional neural networks computing module The output end of input terminal and data preprocessing module, the input terminal and convolutional neural networks computing module of Data Post module Output end be connected, the input terminal of data memory module is connected with the output end of Data Post module;Convolutional neural networks meter Module is calculated also to be connected with network model configuration module;
Wherein, data preprocessing module is used to read phase from data memory module according to the calculation stages being presently in The convolution nuclear parameter and input feature vector figure answered, and convolution nuclear parameter and input feature vector figure are pre-processed, convolution kernel is tieed up by 4 Parameter is rearranged into 3 dimensions, and is desired to make money or profit to input feature vector and spread out and replicated with sliding window, so that in sliding window Local feature figure and convolution nuclear parameter correspond, obtain convenient for the convolution kernel argument sequence and local feature that directly calculate Figure series sends convolutional neural networks for the convolution nuclear parameter handled well and input feature vector figure after the completion of pretreatment and calculates mould Block;
Network model configuration module is used to carry out parameter configuration to convolutional neural networks computing module;Convolutional neural networks Computing module be used for according to configuration parameter to received from data preprocessing module convolution nuclear parameter and input feature vector figure carry out weight Row's processing, and Data Post module is sent by processing result;
Its convolutional neural networks computing module includes convolutional calculation submodule, activation primitive computational submodule, pondization calculating Submodule and full connection computational submodule, according to network mould between these submodules inside convolutional neural networks computing module Type configuration module predefined network model configuration parameter connects;
Data Post module is used to the output data of convolutional neural networks computing module being written to data memory module In;
Data memory module is used to store the model parameter caffemodel of convolutional neural networks, intermediate features figure calculates As a result and final calculation result, the module carry out data exchange by PCIe interface and external host.
Referring to Fig. 2, data preprocessing module reads convolution nuclear parameter and input feature vector figure from data memory module, reads It takes when convolution kernel and is according to PARALLEL_KERNEL size of parameter predefined in model parameter configuration module reading k*k*CiConvolution kernel, wherein CiIndicate the port number of input feature vector figure.Start to carry out sequence to convolution kernel after reading in convolution kernel Change operation, i.e., it will be having a size of k*k*Ci* it is k*k* (C that the four-dimensional convolution kernel of PARALLEL_KERNEL, which is arranged in size,i* PARALLEL_KERNEL three dimensional form).
It is first H*W*C by size when handling input feature vector figureiCharacteristic pattern read in, then according on characteristic pattern The size of sliding window and moving step length carry out characteristic pattern drawout, the size of the characteristic pattern after expansion be ((W-k)/ stride+1)*((H-k)/stride+1)*Ci
It is (PARALLEL_FEATURE_W) * according to configuration parameter interception size size after the expansion of input feature vector figure (PARALLEL_FEATURE_H)*CiPartial Feature figure, the characteristic pattern of interception is replicated more parts, makes its quantity and convolution kernel Quantity it is identical, finally obtained size be (PARALLEL_FEATURE_W) * (PARALLEL_FEATURE_H) * (Ci* PARALLEL_KERNEL) characteristic pattern so that the calculating of multiple convolution kernels and characteristic pattern can be parallel.In convolution kernel and After the completion of characteristic pattern processing, send convolution nuclear parameter after processing at convolutional neural networks computing module with characteristic pattern Reason.
Process flow reference Fig. 3 of the convolutional calculation submodule of convolutional neural networks computing module, the input of the module are Predefined relevant configuration in the convolution nuclear parameter and characteristic pattern and network model configuration module that data preprocessing module generates Parameter.Pretreated convolution kernel and characteristic pattern are three-dimensional matrice, and port number is all PARALLEL_KERNEL*Ci;It will be every Convolution kernel and characteristic pattern on one channel are separately input in different OpenCL computing units use Winograd Matrix Multiplication Method module carries out two-dimensional matrix multiplying, and the calculating between OpenCL computing unit can carry out parallel, and calculated result is A length of (PARALLEL_FEATURE_W/k), width are (PARALLEL_FEATURE_H/k), port number is (Ci*PARALLEL_ KERNEL characteristic pattern).The characteristic pattern of input generates the part output of the convolutional layer after the processing of convolutional calculation submodule Characteristic pattern, part output characteristic pattern can carry out different processing according to next layer of type.If pre- in network model configuration Next layer of definition is convolutional layer or full articulamentum, is incited somebody to action then output characteristic pattern skips pond layer by Data Post module As a result it is medium to be processed to write back to external storage;It will be defeated if predefined next layer in network model configuration is pond layer Enter characteristic pattern and is sent to progress pond processing in the computational submodule of pond.
Referring to Fig. 4, the activation primitive computational submodule in embodiment includes an activation primitive selection submodule and three Function computational submodule, activation primitive select the selector in submodule to be determined by the configuration parameter in model configuration module, and three A function computational submodule respectively corresponds the calculating of Sigmoid, tanh and ReLU activation primitive.The characteristic pattern of input is according to sharp The path that function selection submodule living determines is sent to progress activation primitive calculation processing in function computational submodule, has handled It is sent in data memory module or pond computational submodule at rear according to configuration parameter.
Referring to Fig. 5, the Ping-Pong for the use of two sizes being pool_size*W in the computational submodule of pond Buffer saves the calculated result from activation primitive computational submodule, and wherein pool_size and W is configuration parameter, convolution The calculated result of computational submodule is constantly filled into buffer1 first, can carry out the buffer in the filling process In part pondization calculate, after buffer1 is filled full, the calculated result of convolutional calculation module is filled into buffer2 In, in the filling process of buffer2 can to the data in buffer2 carry out pondization calculate, while buffer1 with Data between buffer2 can also carry out pondization calculating, when buffer2 is filled full, the calculating of convolutional calculation module As a result it is filled into buffer1 again, two buffer are worked alternatively like this to be completed until entire pondization calculates.At two It also include pond window between buffer, the data of the window come from two buffer, and a buffer is counted wherein It calculates and operates and another buffer is filled the calculating that can be carried out the pond window among two buffer when operation. Since the data between the window of pond do not calculate relevance, it is possible to loop unrolling method be used to make in different windows Calculate synchronous carry out.
Referring to Fig. 6, during processing, the input matrix that N number of input vector is constituted laterally divides pond computational submodule It is dim1/m sections, wherein N indicates the quantity of input feature value, and dim1 indicates that the dimension of input feature value, m indicate input The section length of feature vector, each section be separately formed a size be m*N submatrix, submatrix respectively with weight matrix In corresponding part be multiplied can be obtained size be n*N submatrix constitute partial results, dim1/m section partial results conjunction And be exactly the calculated result of the output vector composition of final N, the corresponding part in calculated sub-matrix and weight matrix When matrix multiplication operation, acceleration calculating is carried out using Winograd minimum filtering matrix multiplication.
Referring to Fig. 7, computational submodule is connected when the pond computational submodule in convolutional neural networks computing module or entirely Processing after the completion of, Data Post module starts to calculate above-mentioned pondization or the data of full connection computational submodule output are write It returns in data memory module, it should be in the process using the barrier operation in OpenCL frame to guarantee to obtain whole calculating knots After fruit just start transmission and all data be transmitted after just start in next step handle.
It is the process flow for the above-mentioned acceleration system that embodiment provides referring to Fig. 8, mainly includes three parts;First It is divided into kernel program compilation process, in order to maximally utilize the computing resource and storage resource on FPGA, it is suitable to need to be arranged Network query function degree of parallelism parameter.In embodiment, the process of setting degree of parallelism parameter is automatically performed by program, is set first Determine PARALLEL_FEATURE the and PARALLEL_KERNEL initial value in convolutional neural networks kernel program, then utilizes Altera OpenCL SDK is compiled kernel program, obtains resource utilization from compiling report after the completion of compiling, PARALLEL_ is updated if the utilization of resources does not reach maximum including storage resource, logical resource, computing resource etc. The value of FEATURE and PARALLEL_KERNEL recompilates, until maximum hardware resource utilization is obtained, after the completion of compiling To the hardware program that may operate on FPGA.
Second part is Parameter Configuration process, including network model calculating parameter and model configuration parameter, network model meter It calculates parameter directly to read from the model file caffemodel of Caffe, model configuration parameter includes the input feature vector figure of each layer The configuration of size, the size of convolution kernel, pond window size etc., parameter utilizes clSetKernelArg () letter in OpenCL It counts up into, a following table 1 illustrates the type and parameter value of model configuration parameter by taking VGG16 as an example.
The type and parameter value example of 1 model configuration parameter of table
In upper table, in Activate func column, 0 indicates no activation primitive, and 1 indicates to use ReLU activation primitive, 2 It indicates to use Sigmoid activation primitive, 3 indicate to use Tanh activation primitive;In Output dst column, 1 indicates to be output to data Memory module, 2 indicate to be output to pond computational submodule, and 3 indicate to be output to convolutional calculation submodule.
Part III be neural network operational process, when host by picture transfer into data memory module after System on FPGA brings into operation, and calculated result is returned to host by data memory module after the completion of operation, inputs without picture When terminate to run.
The convolutional neural networks acceleration system based on FPGA that embodiment provides realizes on DE5a-Net development board VGG16 and AlexNet network model, and performance test has been carried out using the image data that size is 224*224*3, it is real It tests statistics indicate that the processing speed of VGG16 is 160ms/image, the processing speed of AlexNet is 12ms/image, is better than it His FPGA implementation.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all wrap Containing within protection scope of the present invention.

Claims (10)

1. a kind of convolutional neural networks acceleration system based on FPGA, which is characterized in that including data preprocessing module, convolution mind Through network query function module, Data Post module, data memory module and network model configuration module;The data prediction mould Block, convolutional neural networks computing module and data post-processing module realize that data memory module is based on outside FPGA piece based on FPGA Storage realizes that network model configuration module is stored based on the on piece of FPGA and realized;
The data preprocessing module is used to read corresponding volume from data memory module according to the calculation stages being presently in Product nuclear parameter and input feature vector figure, and convolution nuclear parameter and input feature vector figure are pre-processed: convolution kernel model parameter is tieed up by 4 It at 3 dimensions, and desires to make money or profit to input feature vector and is unfolded with sliding window and is replicated, so that local feature figure and convolution kernel in sliding window Parameter corresponds, and obtains the convolution kernel argument sequence convenient for directly calculating and local characteristic pattern series;It will after the completion of pretreatment The convolution nuclear parameter and input feature vector figure handled well are sent to convolutional neural networks computing module;
The network model configuration module is used to carry out parameter configuration to convolutional neural networks computing module;The convolutional Neural net Convolutional layer, activation primitive layer, pond layer and full articulamentum in convolutional neural networks is independently arranged by network computing module, passes through ginseng Number configuration joins the convolution kernel received from data preprocessing module according to configuration parameter to construct a variety of different network structures Number and input feature vector figure carry out convolution, intensify, the interlayer stream treatment in pond and full connection calculating, and processing result is sent to data Post-processing module;
The Data Post module is used to the output data of convolutional neural networks computing module being written to data memory module In;
The data memory module is used to store the model parameter caffemodel of convolutional neural networks, intermediate features figure calculates knot Fruit and final calculation result, the module carry out data exchange by PCIe interface and external host.
2. convolutional neural networks acceleration system as described in claim 1, which is characterized in that the convolutional neural networks calculate mould Block includes convolutional calculation submodule, activation primitive computational submodule, pond computational submodule and connects computational submodule entirely;Convolution According to the predefined network model of network model configuration module between these submodules of neural computing inside modules Configuration parameter connects;
The convolutional calculation submodule carries out convolutional calculation using the convolution nuclear parameter and characteristic pattern of input, after the completion sends out result It is sent to activation primitive computational submodule;
The activation primitive computational submodule is selected according to the predefined activation primitive configuration parameter of network model parameter configuration module Select activation primitive;Activation calculating is carried out to characteristic pattern using selected activation primitive, is after the completion sent out result according to parameter configuration It is sent in pond computational submodule or full connection computational submodule;
The pond computational submodule is used to carry out pondization to received characteristic pattern to calculate, and pre- according to network model configuration module Pond result is sent full connection computing module by the configuration parameter of definition, or is sent directly to Data Post module;
The full connection computational submodule is used to carry out received characteristic pattern full connection to calculate, and sends number for full connection result According to post-processing module.
3. convolutional neural networks acceleration system as claimed in claim 2, which is characterized in that the convolutional neural networks calculate mould Block is by convolutional calculation submodule, activation primitive computational submodule, pond computational submodule, full connection computational submodule according to network Model configuration parameter cascades, and is carried out data transmission between these submodules using OpenCL Channel, in these submodules Calculating execute parallel, the calculating between these submodules be flowing water carry out.
4. convolutional neural networks acceleration system as claimed in claim 2 or claim 3, which is characterized in that the convolutional calculation submodule Including one or more matrix multiplication computational submodules;The quantity of matrix multiplication computational submodule is pre- by network model configuration module The configuration parameter of definition is set;Processing between each matrix multiplication computational submodule executes parallel;
Matrix multiplication computational submodule accelerates operation using Winograd minimum filtering algorithm, obtains single convolution for calculating Matrix multiplication between core and corresponding local feature figure.
5. convolutional neural networks acceleration system as claimed in claim 2 or claim 3, which is characterized in that the activation primitive calculates son Module includes activation primitive selection submodule, Sigmoid function computational submodule, Tanh function computational submodule and ReLU function Computational submodule;
Activation primitive selection submodule respectively with Sigmoid function computational submodule, Tanh function computational submodule and ReLU function computational submodule is connected, and the data of characteristic pattern are sent to one in these three computational submodules;
The activation primitive selection submodule is used to set the activation calculation of characteristic pattern in convolutional neural networks;Sigmoid Function computational submodule is used to carry out the calculating of Sigmoid function;Tanh function computational submodule is for carrying out Tanh function It calculates;ReLU function computational submodule is used to carry out the calculating of ReLU function.
6. convolutional neural networks acceleration system as claimed in any one of claims 1 to 5, which is characterized in that the pondization calculates Submodule includes the Double buffer being made of two FPGA on piece storages, for the temporal aspect figure number in storage pool calculating process According to buffer size is set by the predefined network configuration parameters of network model parameter configuration module, the buffering of different pond layers Area is in different size, and table tennis read-write operation is realized by this double-buffer area structure, realizes the stream treatment that pondization calculates.
7. convolutional neural networks acceleration system as claimed in claim 1 or 2, which is characterized in that the data preprocessing module Submodule is pre-processed including data transmission module, convolution nuclear parameter pretreatment submodule and characteristic pattern;
The data transmission module is for controlling feature figure and convolution nuclear parameter in data memory module and convolutional neural networks Transmission between computing module;Convolution nuclear parameter pretreatment submodule is for resetting convolution nuclear parameter, arrangement is handled;It is special Sign figure pretreatment submodule for characteristic pattern is unfolded, replicate and arrangement processing.
8. convolutional neural networks acceleration system as claimed in claim 7, which is characterized in that the data transmission module includes DDR controller, data transmission bus and memory buffers;
The DDR controller be used for control data among DDR and FPGA data transmission, data transmission bus connect DDR and FPGA is the channel of data transmission;Reading of the memory buffers for temporal data, reduction FPGA to DDR, improve data transfer speed Degree.
9. convolutional neural networks acceleration system as claimed in claim 1 or 2, which is characterized in that the data memory module packet Convolution nuclear parameter sub-module stored, characteristic pattern sub-module stored are included, convolution nuclear parameter sub-module stored is for storing convolution kernel ginseng Number, characteristic pattern sub-module stored are used to store the temporal aspect figure in input feature vector figure and calculating process;These sub-module storeds It is preferred that being divided by the DDR memory being connect with FPGA.
10. convolutional neural networks acceleration system as claimed in claim 1 or 2, which is characterized in that the network model parameter is matched Module is set for storing network model configuration parameter, in size, convolutional calculation submodule including network inputs characteristic pattern The parameter rule of pond window size in the size and number of convolution nuclear parameter, pond computational submodule, full connection computational submodule Mould calculates degree of parallelism;Data in network model parameter configuration module are preferably previously written before system starting.
CN201810710069.1A 2018-07-02 2018-07-02 Convolutional neural network acceleration system based on FPGA Expired - Fee Related CN109086867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810710069.1A CN109086867B (en) 2018-07-02 2018-07-02 Convolutional neural network acceleration system based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810710069.1A CN109086867B (en) 2018-07-02 2018-07-02 Convolutional neural network acceleration system based on FPGA

Publications (2)

Publication Number Publication Date
CN109086867A true CN109086867A (en) 2018-12-25
CN109086867B CN109086867B (en) 2021-06-08

Family

ID=64836906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810710069.1A Expired - Fee Related CN109086867B (en) 2018-07-02 2018-07-02 Convolutional neural network acceleration system based on FPGA

Country Status (1)

Country Link
CN (1) CN109086867B (en)

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656721A (en) * 2018-12-28 2019-04-19 上海新储集成电路有限公司 A kind of efficient intelligence system
CN109685209A (en) * 2018-12-29 2019-04-26 福州瑞芯微电子股份有限公司 A kind of device and method for accelerating neural network computing speed
CN109767002A (en) * 2019-01-17 2019-05-17 济南浪潮高新科技投资发展有限公司 A kind of neural network accelerated method based on muti-piece FPGA collaboration processing
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA
CN109799977A (en) * 2019-01-25 2019-05-24 西安电子科技大学 The method and system of instruction repertorie exploitation scheduling data
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm
CN109961139A (en) * 2019-01-08 2019-07-02 广东浪潮大数据研究有限公司 A kind of accelerated method, device, equipment and the storage medium of residual error network
CN109976903A (en) * 2019-02-22 2019-07-05 华中科技大学 A kind of deep learning Heterogeneous Computing method and system based on slice width Memory Allocation
CN110188869A (en) * 2019-05-05 2019-08-30 北京中科汇成科技有限公司 A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating
CN110263925A (en) * 2019-06-04 2019-09-20 电子科技大学 A kind of hardware-accelerated realization framework of the convolutional neural networks forward prediction based on FPGA
CN110334801A (en) * 2019-05-09 2019-10-15 苏州浪潮智能科技有限公司 A kind of hardware-accelerated method, apparatus, equipment and the system of convolutional neural networks
CN110390392A (en) * 2019-08-01 2019-10-29 上海安路信息科技有限公司 Deconvolution parameter accelerator, data read-write method based on FPGA
CN110399883A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 Image characteristic extracting method, device, equipment and computer readable storage medium
CN110443357A (en) * 2019-08-07 2019-11-12 上海燧原智能科技有限公司 Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium
CN110458279A (en) * 2019-07-15 2019-11-15 武汉魅瞳科技有限公司 A kind of binary neural network accelerated method and system based on FPGA
CN110490311A (en) * 2019-07-08 2019-11-22 华南理工大学 Convolutional neural networks accelerator and its control method based on RISC-V framework
CN110852930A (en) * 2019-10-25 2020-02-28 华中科技大学 FPGA graph processing acceleration method and system based on OpenCL
CN111079923A (en) * 2019-11-08 2020-04-28 中国科学院上海高等研究院 Spark convolution neural network system suitable for edge computing platform and circuit thereof
CN111105015A (en) * 2019-12-06 2020-05-05 浪潮(北京)电子信息产业有限公司 General CNN reasoning accelerator, control method thereof and readable storage medium
CN111160544A (en) * 2019-12-31 2020-05-15 上海安路信息科技有限公司 Data activation method and FPGA data activation system
CN111210019A (en) * 2020-01-16 2020-05-29 电子科技大学 Neural network inference method based on software and hardware cooperative acceleration
CN111242289A (en) * 2020-01-19 2020-06-05 清华大学 Convolutional neural network acceleration system and method with expandable scale
CN111325327A (en) * 2020-03-06 2020-06-23 四川九洲电器集团有限责任公司 Universal convolution neural network operation architecture based on embedded platform and use method
CN111340198A (en) * 2020-03-26 2020-06-26 上海大学 Neural network accelerator with highly-multiplexed data based on FPGA (field programmable Gate array)
CN111626403A (en) * 2020-05-14 2020-09-04 北京航空航天大学 Convolutional neural network accelerator based on CPU-FPGA memory sharing
CN111736986A (en) * 2020-05-29 2020-10-02 浪潮(北京)电子信息产业有限公司 FPGA (field programmable Gate array) accelerated execution method of deep learning model and related device
CN111860781A (en) * 2020-07-10 2020-10-30 逢亿科技(上海)有限公司 Convolutional neural network feature decoding system realized based on FPGA
CN111931913A (en) * 2020-08-10 2020-11-13 西安电子科技大学 Caffe-based deployment method of convolutional neural network on FPGA
CN112101284A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Image recognition method, training method, device and system of image recognition model
CN112149814A (en) * 2020-09-23 2020-12-29 哈尔滨理工大学 Convolutional neural network acceleration system based on FPGA
CN112732638A (en) * 2021-01-22 2021-04-30 上海交通大学 Heterogeneous acceleration system and method based on CTPN network
CN112766478A (en) * 2021-01-21 2021-05-07 中国电子科技集团公司信息科学研究院 FPGA pipeline structure for convolutional neural network
CN112819140A (en) * 2021-02-02 2021-05-18 电子科技大学 OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method
CN112905526A (en) * 2021-01-21 2021-06-04 北京理工大学 FPGA implementation method for various types of convolution
CN112949845A (en) * 2021-03-08 2021-06-11 内蒙古大学 Deep convolutional neural network accelerator based on FPGA
CN113065647A (en) * 2021-03-30 2021-07-02 西安电子科技大学 Computing-storage communication system and communication method for accelerating neural network
CN113467783A (en) * 2021-07-19 2021-10-01 中科曙光国际信息产业有限公司 Kernel function compiling method and device of artificial intelligent accelerator
CN113517007A (en) * 2021-04-29 2021-10-19 西安交通大学 Flow processing method and system and memristor array
WO2021259105A1 (en) * 2020-06-22 2021-12-30 深圳鲲云信息科技有限公司 Neural network accelerator
CN113949592A (en) * 2021-12-22 2022-01-18 湖南大学 Anti-attack defense system and method based on FPGA
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
CN114943635A (en) * 2021-09-30 2022-08-26 太初(无锡)电子科技有限公司 Fusion operator design and implementation method based on heterogeneous collaborative computing core
CN114997392A (en) * 2022-08-03 2022-09-02 成都图影视讯科技有限公司 Architecture and architectural methods for neural network computing
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11580386B2 (en) 2019-03-18 2023-02-14 Electronics And Telecommunications Research Institute Convolutional layer acceleration unit, embedded system having the same, and method for operating the embedded system
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
CN117195989A (en) * 2023-11-06 2023-12-08 深圳市九天睿芯科技有限公司 Vector processor, neural network accelerator, chip and electronic equipment
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
EP4156079A4 (en) * 2020-05-22 2024-03-27 Inspur Electronic Information Industry Co., Ltd Image data storage method, image data processing method and system, and related apparatus
US12014553B2 (en) 2019-02-01 2024-06-18 Tesla, Inc. Predicting three-dimensional features for autonomous driving

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341127A (en) * 2017-07-05 2017-11-10 西安电子科技大学 Convolutional neural networks accelerated method based on OpenCL standards
CN107657581A (en) * 2017-09-28 2018-02-02 中国人民解放军国防科技大学 Convolutional neural network CNN hardware accelerator and acceleration method
US20180046900A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Sparse convolutional neural network accelerator
CN108154229A (en) * 2018-01-10 2018-06-12 西安电子科技大学 Accelerate the image processing method of convolutional neural networks frame based on FPGA

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046900A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Sparse convolutional neural network accelerator
CN107341127A (en) * 2017-07-05 2017-11-10 西安电子科技大学 Convolutional neural networks accelerated method based on OpenCL standards
CN107657581A (en) * 2017-09-28 2018-02-02 中国人民解放军国防科技大学 Convolutional neural network CNN hardware accelerator and acceleration method
CN108154229A (en) * 2018-01-10 2018-06-12 西安电子科技大学 Accelerate the image processing method of convolutional neural networks frame based on FPGA

Cited By (95)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US12020476B2 (en) 2017-03-23 2024-06-25 Tesla, Inc. Data synthesis for autonomous control systems
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US12086097B2 (en) 2017-07-24 2024-09-10 Tesla, Inc. Vector computational unit
US11797304B2 (en) 2018-02-01 2023-10-24 Tesla, Inc. Instruction set architecture for a vector computational unit
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US12079723B2 (en) 2018-07-26 2024-09-03 Tesla, Inc. Optimizing neural network structures for embedded systems
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11983630B2 (en) 2018-09-03 2024-05-14 Tesla, Inc. Neural networks for embedded devices
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11908171B2 (en) 2018-12-04 2024-02-20 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
CN109656721A (en) * 2018-12-28 2019-04-19 上海新储集成电路有限公司 A kind of efficient intelligence system
CN109685209A (en) * 2018-12-29 2019-04-26 福州瑞芯微电子股份有限公司 A kind of device and method for accelerating neural network computing speed
CN109685209B (en) * 2018-12-29 2020-11-06 瑞芯微电子股份有限公司 Device and method for accelerating operation speed of neural network
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm
CN109961139A (en) * 2019-01-08 2019-07-02 广东浪潮大数据研究有限公司 A kind of accelerated method, device, equipment and the storage medium of residual error network
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA
CN109784489B (en) * 2019-01-16 2021-07-30 北京大学软件与微电子学院 Convolutional neural network IP core based on FPGA
CN109767002A (en) * 2019-01-17 2019-05-17 济南浪潮高新科技投资发展有限公司 A kind of neural network accelerated method based on muti-piece FPGA collaboration processing
CN109799977B (en) * 2019-01-25 2021-07-27 西安电子科技大学 Method and system for developing and scheduling data by instruction program
CN109799977A (en) * 2019-01-25 2019-05-24 西安电子科技大学 The method and system of instruction repertorie exploitation scheduling data
US12014553B2 (en) 2019-02-01 2024-06-18 Tesla, Inc. Predicting three-dimensional features for autonomous driving
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
CN109976903A (en) * 2019-02-22 2019-07-05 华中科技大学 A kind of deep learning Heterogeneous Computing method and system based on slice width Memory Allocation
US11568268B2 (en) 2019-02-22 2023-01-31 Huazhong University Of Science And Technology Deep learning heterogeneous computing method based on layer-wide memory allocation and system thereof
US11580386B2 (en) 2019-03-18 2023-02-14 Electronics And Telecommunications Research Institute Convolutional layer acceleration unit, embedded system having the same, and method for operating the embedded system
CN110188869B (en) * 2019-05-05 2021-08-10 北京中科汇成科技有限公司 Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm
CN110188869A (en) * 2019-05-05 2019-08-30 北京中科汇成科技有限公司 A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating
CN110334801A (en) * 2019-05-09 2019-10-15 苏州浪潮智能科技有限公司 A kind of hardware-accelerated method, apparatus, equipment and the system of convolutional neural networks
CN110263925A (en) * 2019-06-04 2019-09-20 电子科技大学 A kind of hardware-accelerated realization framework of the convolutional neural networks forward prediction based on FPGA
CN110263925B (en) * 2019-06-04 2022-03-15 电子科技大学 Hardware acceleration implementation device for convolutional neural network forward prediction based on FPGA
CN110399883A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 Image characteristic extracting method, device, equipment and computer readable storage medium
CN110490311A (en) * 2019-07-08 2019-11-22 华南理工大学 Convolutional neural networks accelerator and its control method based on RISC-V framework
CN110458279A (en) * 2019-07-15 2019-11-15 武汉魅瞳科技有限公司 A kind of binary neural network accelerated method and system based on FPGA
CN110458279B (en) * 2019-07-15 2022-05-20 武汉魅瞳科技有限公司 FPGA-based binary neural network acceleration method and system
CN110390392A (en) * 2019-08-01 2019-10-29 上海安路信息科技有限公司 Deconvolution parameter accelerator, data read-write method based on FPGA
CN110443357B (en) * 2019-08-07 2020-09-15 上海燧原智能科技有限公司 Convolutional neural network calculation optimization method and device, computer equipment and medium
CN110443357A (en) * 2019-08-07 2019-11-12 上海燧原智能科技有限公司 Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium
CN110852930A (en) * 2019-10-25 2020-02-28 华中科技大学 FPGA graph processing acceleration method and system based on OpenCL
CN111079923A (en) * 2019-11-08 2020-04-28 中国科学院上海高等研究院 Spark convolution neural network system suitable for edge computing platform and circuit thereof
CN111079923B (en) * 2019-11-08 2023-10-13 中国科学院上海高等研究院 Spark convolutional neural network system suitable for edge computing platform and circuit thereof
CN111105015A (en) * 2019-12-06 2020-05-05 浪潮(北京)电子信息产业有限公司 General CNN reasoning accelerator, control method thereof and readable storage medium
CN111160544A (en) * 2019-12-31 2020-05-15 上海安路信息科技有限公司 Data activation method and FPGA data activation system
CN111160544B (en) * 2019-12-31 2021-04-23 上海安路信息科技股份有限公司 Data activation method and FPGA data activation system
CN111210019B (en) * 2020-01-16 2022-06-24 电子科技大学 Neural network inference method based on software and hardware cooperative acceleration
CN111210019A (en) * 2020-01-16 2020-05-29 电子科技大学 Neural network inference method based on software and hardware cooperative acceleration
CN111242289A (en) * 2020-01-19 2020-06-05 清华大学 Convolutional neural network acceleration system and method with expandable scale
CN111325327A (en) * 2020-03-06 2020-06-23 四川九洲电器集团有限责任公司 Universal convolution neural network operation architecture based on embedded platform and use method
CN111340198A (en) * 2020-03-26 2020-06-26 上海大学 Neural network accelerator with highly-multiplexed data based on FPGA (field programmable Gate array)
CN111340198B (en) * 2020-03-26 2023-05-05 上海大学 Neural network accelerator for data high multiplexing based on FPGA
CN111626403A (en) * 2020-05-14 2020-09-04 北京航空航天大学 Convolutional neural network accelerator based on CPU-FPGA memory sharing
EP4156079A4 (en) * 2020-05-22 2024-03-27 Inspur Electronic Information Industry Co., Ltd Image data storage method, image data processing method and system, and related apparatus
CN111736986A (en) * 2020-05-29 2020-10-02 浪潮(北京)电子信息产业有限公司 FPGA (field programmable Gate array) accelerated execution method of deep learning model and related device
CN111736986B (en) * 2020-05-29 2023-06-23 浪潮(北京)电子信息产业有限公司 FPGA (field programmable Gate array) acceleration execution method and related device of deep learning model
WO2021259105A1 (en) * 2020-06-22 2021-12-30 深圳鲲云信息科技有限公司 Neural network accelerator
CN111860781B (en) * 2020-07-10 2024-06-28 逢亿科技(上海)有限公司 Convolutional neural network feature decoding system based on FPGA
CN111860781A (en) * 2020-07-10 2020-10-30 逢亿科技(上海)有限公司 Convolutional neural network feature decoding system realized based on FPGA
CN111931913B (en) * 2020-08-10 2023-08-01 西安电子科技大学 Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe
CN111931913A (en) * 2020-08-10 2020-11-13 西安电子科技大学 Caffe-based deployment method of convolutional neural network on FPGA
CN112149814A (en) * 2020-09-23 2020-12-29 哈尔滨理工大学 Convolutional neural network acceleration system based on FPGA
CN112101284A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Image recognition method, training method, device and system of image recognition model
CN112905526A (en) * 2021-01-21 2021-06-04 北京理工大学 FPGA implementation method for various types of convolution
CN112766478B (en) * 2021-01-21 2024-04-12 中国电子科技集团公司信息科学研究院 FPGA (field programmable Gate array) pipeline structure oriented to convolutional neural network
CN112766478A (en) * 2021-01-21 2021-05-07 中国电子科技集团公司信息科学研究院 FPGA pipeline structure for convolutional neural network
CN112732638A (en) * 2021-01-22 2021-04-30 上海交通大学 Heterogeneous acceleration system and method based on CTPN network
CN112732638B (en) * 2021-01-22 2022-05-06 上海交通大学 Heterogeneous acceleration system and method based on CTPN network
CN112819140B (en) * 2021-02-02 2022-06-24 电子科技大学 OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method
CN112819140A (en) * 2021-02-02 2021-05-18 电子科技大学 OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method
CN112949845A (en) * 2021-03-08 2021-06-11 内蒙古大学 Deep convolutional neural network accelerator based on FPGA
CN113065647A (en) * 2021-03-30 2021-07-02 西安电子科技大学 Computing-storage communication system and communication method for accelerating neural network
CN113065647B (en) * 2021-03-30 2023-04-25 西安电子科技大学 Calculation-storage communication system and communication method for accelerating neural network
CN113517007B (en) * 2021-04-29 2023-07-25 西安交通大学 Flowing water processing method and system and memristor array
CN113517007A (en) * 2021-04-29 2021-10-19 西安交通大学 Flow processing method and system and memristor array
CN113467783A (en) * 2021-07-19 2021-10-01 中科曙光国际信息产业有限公司 Kernel function compiling method and device of artificial intelligent accelerator
CN113467783B (en) * 2021-07-19 2023-09-12 中科曙光国际信息产业有限公司 Nuclear function compiling method and device of artificial intelligent accelerator
CN114943635B (en) * 2021-09-30 2023-08-22 太初(无锡)电子科技有限公司 Fusion operator design and implementation method based on heterogeneous collaborative computing core
CN114943635A (en) * 2021-09-30 2022-08-26 太初(无锡)电子科技有限公司 Fusion operator design and implementation method based on heterogeneous collaborative computing core
CN113949592A (en) * 2021-12-22 2022-01-18 湖南大学 Anti-attack defense system and method based on FPGA
CN113949592B (en) * 2021-12-22 2022-03-22 湖南大学 Anti-attack defense system and method based on FPGA
CN114997392B (en) * 2022-08-03 2022-10-21 成都图影视讯科技有限公司 Architecture and architectural methods for neural network computing
CN114997392A (en) * 2022-08-03 2022-09-02 成都图影视讯科技有限公司 Architecture and architectural methods for neural network computing
CN117195989A (en) * 2023-11-06 2023-12-08 深圳市九天睿芯科技有限公司 Vector processor, neural network accelerator, chip and electronic equipment
CN117195989B (en) * 2023-11-06 2024-06-04 深圳市九天睿芯科技有限公司 Vector processor, neural network accelerator, chip and electronic equipment

Also Published As

Publication number Publication date
CN109086867B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN109086867A (en) A kind of convolutional neural networks acceleration system based on FPGA
CN108241890B (en) Reconfigurable neural network acceleration method and architecture
US20220012593A1 (en) Neural network accelerator and neural network acceleration method based on structured pruning and low-bit quantization
CN110058883B (en) CNN acceleration method and system based on OPU
CN107578095B (en) Neural computing device and processor comprising the computing device
CN111967468B (en) Implementation method of lightweight target detection neural network based on FPGA
CN108537331A (en) A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic
CN107657581A (en) Convolutional neural network CNN hardware accelerator and acceleration method
CN109784489A (en) Convolutional neural networks IP kernel based on FPGA
CN110458279A (en) A kind of binary neural network accelerated method and system based on FPGA
CN109472356A (en) A kind of accelerator and method of restructural neural network algorithm
CN107066239A (en) A kind of hardware configuration for realizing convolutional neural networks forward calculation
CN107239824A (en) Apparatus and method for realizing sparse convolution neutral net accelerator
CN108647773A (en) A kind of hardwired interconnections framework of restructural convolutional neural networks
CN110674927A (en) Data recombination method for pulse array structure
CN109446996B (en) Face recognition data processing device and method based on FPGA
CN113743599B (en) Computing device and server of convolutional neural network
Que et al. Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs
CN113792621A (en) Target detection accelerator design method based on FPGA
Huang et al. A high performance multi-bit-width booth vector systolic accelerator for NAS optimized deep learning neural networks
CN109739556A (en) A kind of general deep learning processor that interaction is cached based on multiple parallel and is calculated
Duan et al. Energy-efficient architecture for FPGA-based deep convolutional neural networks with binary weights
CN115238863A (en) Hardware acceleration method, system and application of convolutional neural network convolutional layer
CN110490308A (en) Accelerate design method, terminal device and the storage medium in library
CN113157638B (en) Low-power-consumption in-memory calculation processor and processing operation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210608