CN111931913A - Caffe-based deployment method of convolutional neural network on FPGA - Google Patents
Caffe-based deployment method of convolutional neural network on FPGA Download PDFInfo
- Publication number
- CN111931913A CN111931913A CN202010793360.7A CN202010793360A CN111931913A CN 111931913 A CN111931913 A CN 111931913A CN 202010793360 A CN202010793360 A CN 202010793360A CN 111931913 A CN111931913 A CN 111931913A
- Authority
- CN
- China
- Prior art keywords
- neural network
- layer
- convolutional neural
- caffe
- fpga
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/34—Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
- G06F30/347—Physical level, e.g. placement or routing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Geometry (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a Caffe-based deployment method of a convolutional neural network on an FPGA (field programmable gate array), which is used for solving the problems of large time consumption, long deployment time and poor practical applicability in the prior art and comprises the following implementation steps of: acquiring a training sample set; constructing a convolutional neural network model C based on Caffe; training a convolutional neural network model C based on Caffe; caffe extraction-based trained convolutional neural network model C*And storing the parameters; building convolutional neural network C based on verilog**(ii) a And acquiring a deployment result of the convolutional neural network on the FPGA. The invention utilizes the content of CaffeThe method establishes a convolutional neural network model easy to control, improves the deployment speed of the convolutional neural network on the FPGA, designs mapping operation during parameter configuration, and enables the convolutional neural network to be deployed smoothly.
Description
Technical Field
The invention belongs to the technical field of hardware acceleration of convolutional neural networks, relates to a deployment method of a convolutional neural network on an FPGA (field programmable gate array), in particular to a deployment method of a convolutional neural network on an FPGA based on Caffe, and can be used for developing an artificial intelligent hardware acceleration platform.
Background
The convolutional neural network has the characteristics of weight sharing and sparse connection, can extract local features of data and reduce parameter magnitude, is more suitable for processing a data set with large data volume, has high identification precision, shows excellent performance in the field of artificial intelligence, and is widely applied to the fields of image classification, target positioning, face identification, bone identification and the like. The FPGA becomes a popular part in the field of hardware development at present, mainly because of two characteristics, on one hand, a large number of digital circuit basic gate structures and storage units are integrated in the FPGA, and a user can change an internal logic structure by programming a configuration file into the FPGA, so that the purpose of customizing a circuit is achieved; on the other hand, FPGAs do not belong to von neumann architecture, and the result calculated by the previous unit can be directly sent to the next unit without temporary storage in a memory, so that the FPGA has very low bandwidth requirement, and the pipeline processing architecture has the characteristics of quick response and low delay, and is widely applied to video image processing, communication field and digital signal processing.
The convolutional neural network can be deployed on a CPU, a GPU and an FPGA, time consumption and deployment speed are generally used as indexes for measuring the deployment of the convolutional neural network, the CPU is too low in computing capacity to give full play to the characteristics of the convolutional neural network, the GPU is too high in power consumption to cause limited application scenes, the FPGA has the characteristics of low power consumption and low delay, meanwhile, the FPGA has abundant logic resources, large-scale parallel computing can be realized by writing an internal structure, the convolutional neural network can be deployed on the FPGA to give full play to the characteristics of the parallel computing of the convolutional neural network, time consumption can be reduced compared with the situation that the convolutional neural network is deployed on the CPU, and the deployment speed can be improved compared with the situation that the convolutional neural network is deployed on the GPU.
For example, the patent application with application publication number CN 111104124 a, entitled "fast deployment method of convolutional neural network based on Pytorch framework on FPGA"; a method for rapidly deploying a convolutional neural network on an FPGA based on a Pythrch framework is disclosed, which comprises the following steps: firstly, a model fast mapping mechanism is established through the construction of a naming rule. And then, carrying out optimization strategy calculation under the constraint condition of hardware resources, and establishing a template base based on the hardware optimization strategy. And finally, decomposing the complex network model file at the FPGA end in a self-adaptive processing flow based on rule mapping, abstracting the network into a directed acyclic graph, and finally generating the neural network accelerator. The method has the following defects: the time consumption is large and the deployment speed is slow when the Pythrch framework is used for deploying the convolutional neural network; in addition, the method lacks specific operation of calling the template library, and the situation that parameters cannot be configured possibly occurs during actual calling, so that the actual applicability is poor.
Disclosure of Invention
The invention aims to provide a Caffe-based convolutional neural network deployment method on an FPGA (field programmable gate array) aiming at the defects of the prior art, and is used for solving the problems of high time consumption, low deployment speed and poor practical applicability in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) obtaining a training sample set:
taking n images with class labels obtained from the Iris dataset as a training sample set P, wherein P is { P ═ P1,P2,...,Pi,...,PnIn which P isiRepresents the ith image, and n is more than or equal to 10000;
(2) constructing a convolutional neural network model C based on Caffe:
the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scaleconvNumber of convolution kernels num and convolution step size strideconvThe parameter of the pooling layer B is a pooling window size scalepoolAnd pooling window moving step size stridepoolThe number of the neurons connected with the full connection layer D is q, and m is more than or equal to 2;
(3) training a convolutional neural network model C based on Caffe:
(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is CtThe learning rate of the t-th iteration is alphatInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T,0<α<0.01,stepsize<T,0<gamma<1,0<power<1, and let t equal to 0, alphat=α,Ct=C;
(3b) Layer calls convolutional neural network model C based on cafetSimultaneously, based on the context, the Solver calls a configuration file Solver;
(3c) randomly selecting N samples from the training sample set as a convolutional neural network model CtThe input of the N samples is predicted to obtain the prediction class labels corresponding to the N samples, and C is calculated according to the class label and the prediction class label of each sampletLoss value E oft,
(3d) Adopting a back propagation method according to the learning rate alpha in the solutiontAnd loss value EtFor convolution neural network model CtConvolution kernel weight ω of convolution layer a of (a)tConnection parameter theta with full connection layer DtUpdating to obtain a t-th iteration convolutional neural network model C't;
(3e) Judging t>If stepsize is true, if yes, learning rate alpha is determined according to gamma and powertUpdating, otherwise, executing the step (3 f);
(3f) judging whether T is true or not, if so, obtaining a trained convolutional neural network model C*Otherwise, let t be t +1, and execute step (3 c);
(4) caffe extraction-based trained convolutional neural network model C*And storing:
(4a) trained convolutional neural network C based on get _ layer function pair in Caffe*Decomposing to obtain a convolution Layer A ', a pooling Layer B ' and a full-connection Layer D ' represented by Layer;
(4b) establishing an empty dictionary file D, extracting parameters of the convolutional layer A ', parameters of the pooling layer B ' and the number q of connected neurons in the fully-connected layer D ' through an items function in Caffe, and storing the parameters in the dictionary file D;
(5) building convolutional neural network C based on verilog**:
(5a) Building a convolutional layer template for configuring parameters of a convolutional layer A ', a pooling layer template for configuring parameters of a pooling layer B ' and a fully-connected layer template for configuring the number q of connected neurons in a fully-connected layer D ' by using verilog;
(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is usedconvNum and strideconvCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module mconvBy the size scale of the parameter of the pooling layer B' in the dictionary file dpoolAnd stridepoolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module mpoolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module mfcAnd m isconv、mpoolAnd m are mfcAre connected in sequence to form a convolutional neural network C**;
(6) Acquiring a deployment result of the convolutional neural network on the FPGA:
(6a) by verilog pair C**Performing RTL level description, and performing C by using Synthesis tool of VIVADO**The RTL level description of (A) is translated into a logic netlist;
(6b) according to the time sequence constraint and the area constraint condition, optimizing the layout and the wiring of the logic netlist on the FPGA by an implantation tool of VIVADO (virtual evolution data only) to generate an optimized logic netlist;
(6c) and converting the optimized logic netlist into a bit stream file which can be used for circuit deployment on the FPGA through a generate bit stream tool of the VIVADO, and burning the bit stream file into a Flash memory Flash on the FPGA.
Compared with the prior art, the invention has the following advantages:
firstly, Caffe is adopted to train the convolutional neural network, and Caffe is used as a frame for direct compiling and executing, so that the method has higher executing speed, and can dynamically adjust the learning rate so as to train the convolutional neural network in shorter time; meanwhile, the Caffe framework can simplify the deployment process by utilizing various functions, so that the time consumption of the convolutional neural network deployment is reduced.
Secondly, the invention adopts Caffe to extract the trained convolutional neural network model C*The Caffe can directly store the convolution Layer, the pooling Layer and the full-connection Layer of the convolution neural network model in a Layer structure through a function, information stored in the Layer structure is easy to view and operate, the response speed is higher during calling, and the deployment speed is effectively improved.
Thirdly, when parameters are configured for the convolutional layer template, the pooling layer template and the full-connection layer template, a mapping method is adopted, and a plurality of tools in the VIVADO are used for acquiring a deployment result when the deployment result is acquired, so that the practical applicability is improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
FIG. 2 is a flow chart of the implementation of the present invention for training the convolutional neural network model C based on Caffe.
FIG. 3 is a diagram of the present invention's results of acquiring the deployment of a convolutional neural network on an FPGA.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
Referring to fig. 1, the present invention includes the steps of:
step 1) obtaining a training sample set:
the Iris data set as a classic data set in the field of machine learning and artificial intelligence comprises 150 data samples which are divided into 3 classes, each class comprises 50 data, and each data comprises 4 attributes. The Iris flower species can be predicted by 4 attributes including the length of the calyx, the width of the calyx, the length of the petals and the width of the petals, n images with class labels acquired from an Iris data set are used as a training sample set P, and P is { P ═ P1,P2,...,Pi,...,PnIn which P isiRepresents the ith image, n is 10000;
step 2) constructing a convolutional neural network model C based on Caffe:
the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scaleconvNumber of convolution kernels num and convolution step size strideconvThe parameter of the pooling layer B is a pooling window size scalepoolAnd pooling window moving step size stridepoolThe number of the neurons connected with the full connection layer D is q; the construction method can directly define the names, types and parameters of the convolutional layer, the pooling layer and the full-connection layer through the functional configuration parameters, effectively simplifies the steps of model establishment and improves the deployment speed; in this example, m is 2, and the specific parameters of each layer of the convolutional neural network C are as follows:
the convolution kernel size of convolution layer A is 7 x 7, the number of convolution kernels is 64, and the convolution step size is 2;
the size of the pooling window of the pooling layer B is set to be 2 multiplied by 2, and the moving step length of the pooling window is 2;
the number of the neurons connected by the full connection layer D is 512;
the length of the full connection layer can be automatically adjusted according to actual needs, and weight balance in time and space is guaranteed so as to increase the complexity of the model.
Step 3) referring to fig. 2, training a convolutional neural network model C based on Caffe:
(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is CtThe learning rate of the t-th iteration is alphatInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T,0<α<0.01,stepsize<T,0<gamma<1,0<power<1, and let t equal to 0, alphat=α,CtC, in this example T70, α 0.002, stepsize 5000, gamma 0.5, power 0.5;
(3b) layer calls convolutional neural network model C based on cafetThe method is characterized in that the method is based on the same call, wherein the Solver calls the configuration file Solver, the two calls are based on the using calling method, the type of the name space can be directly called without specifying the detailed name space, and therefore the time consumption during calling is reduced;
(3c) randomly selecting N samples from the training sample set as a convolutional neural network model CtThe input of the N samples is predicted to obtain the prediction class labels corresponding to the N samples, and C is calculated according to the class label and the prediction class label of each sampletLoss value E oft,In this example, N is 500, which is calculated as:
wherein n isiThe serial number of the selected sample is indicated,c represents the total column number of class labels of the training sample set, kiA column number indicating a sample category label,denotes the n-thiKth in class label of individual sampleiThe elements of the columns, log, represent logarithmic operations based on e,denotes the n-thiKth in prediction class label of individual sampleiElements of a column;
(3d) adopting a back propagation method according to the learning rate alpha in the solutiontAnd loss value EtFor convolution neural network model CtConvolution kernel weight ω of convolution layer a of (a)tConnection parameter theta with full connection layer DtUpdating to obtain a t-th iteration convolutional neural network model C'tThe update formula is as follows:
wherein, ω ist+1Represents the updated convolution kernel weight, θt+1Indicating the updated full connectivity layer connection parameters,representing a derivative operation;
(3e) judging t>If stepsize is true, if yes, learning rate alpha is determined according to gamma and powertUpdating, otherwise, executing the step (3 f); wherein the learning rate update formula is:
αt+1=αt×(1+gamma×t)(-power)
wherein alpha ist+1Represents the updated learning rate;
(3f) judging whether T is true or not, if so, obtaining a trained convolutional neural network model C*Otherwise, let t be t +1, and execute step (3 c);
step 4) extracting the trained convolutional neural network model C based on Caffe*And storing:
(4a) trained convolutional neural network C based on get _ layer function pair in Caffe*Decomposing to obtain a convolution Layer A ', a pooling Layer B ' and a full-connection Layer D ' represented by Layer;
(4b) establishing an empty dictionary file D, extracting parameters of the convolutional layer A ', parameters of the pooling layer B ' and the number q of connected neurons in the fully-connected layer D ' through an items function in Caffe, and storing the parameters in the dictionary file D;
step 5) establishing a convolutional neural network C based on verilog**:
(5a) Building a convolutional layer template for configuring parameters of a convolutional layer A ', a pooling layer template for configuring parameters of a pooling layer B ' and a fully-connected layer template for configuring the number q of connected neurons in a fully-connected layer D ' by using verilog;
(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is usedconvNum and strideconvCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module mconvBy parameter scale of the pooling layer B' in the dictionary file dpoolAnd stridepoolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module mpoolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module mfc;
Traversing the size of an input feature diagram and the size of an input/output channel of the convolutional neural network, searching hardware design parameters which accord with the following formulas and comprise the output feature diagram parallelism Tm, the input feature diagram parallelism Tn, the high Tr of the feature diagram after blocking and the wide Tc of the feature diagram after blocking, if the resource occupation of the searched reorganized hardware design parameters is less than the total resource, and the current overall calculation delay is taken as the minimum delay, storing the group of hardware parameters, otherwise, continuously searching, and finally obtaining the hardware design parameters adopted by the minimum delay;
wherein T represents the overall calculation time delay, H represents the height of the characteristic diagram, W represents the width of the characteristic diagram, M represents an output channel, and N represents an input channel;
will Tm×TnxW weights and TnSize is Tr×TcIs input to the convolution module mconvBuffer of (1), and then buffering T from the buffernInputting the input characteristic diagram of each input channel and the corresponding number of weights into a convolution calculation unit, simultaneously performing multiplication and accumulation, and performing convolution window sliding on the block input characteristic diagram until the block input characteristic diagram slides to Tr×TcAfter the position of the point, the convolution calculation of the group of input feature maps is finished, and the convolution calculation result is input to mpool;
Calculating the T obtained after convolutionmLoading an output feature map to mpoolBuffer of (2), T when pooling operation startsmThe same position values of the characteristic graphs are input into a comparison unit, the comparison unit takes the maximum value in the characteristic values as the characteristic value of the output characteristic graph to complete pooling operation, and m is obtainedpoolThe result of the pooling operation in (1) is input to mfc;
M is to beconv、mpoolAnd m are mfcForming a convolutional neural network C**;
Step 6) referring to fig. 3, acquiring a deployment result of the convolutional neural network on the FPGA:
(6a) by verilog pair C**Performing RTL level description, and performing C by using Synthesis tool of VIVADO**The RTL level description of (A) is translated into a logic netlist;
(6b) according to the time sequence constraint and the area constraint condition, optimizing the layout and the wiring of the logic netlist on the FPGA by an implantation tool of VIVADO (virtual evolution data only) to generate an optimized logic netlist;
(6c) and converting the optimized logic netlist into a bit stream file which can be used for circuit deployment on the FPGA through a generate bit stream tool of the VIVADO, and burning the bit stream file into a Flash memory Flash on the FPGA.
Claims (5)
1. A Caffe-based convolutional neural network deployment method on an FPGA is characterized by comprising the following steps:
(1) obtaining a training sample set:
taking n images with class labels obtained from the Iris dataset as a training sample set P, wherein P is { P ═ P1,P2,...,Pi,...,PnIn which P isiRepresenting the ith image, n is less than or equal to 10000;
(2) constructing a convolutional neural network model C based on Caffe:
the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scaleconvNumber of convolution kernels num and convolution step size strideconvThe parameter of the pooling layer B is a pooling window size scalepoolAnd pooling window moving step size stridepoolThe number of the connecting neurons of the full connecting layer D is q, and m is more than or equal to 2;
(3) training a convolutional neural network model C based on Caffe:
(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is CtThe learning rate of the t-th iteration is alphatInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T,0<α<0.01,using,0<gamma<1,0<power<1, and let t equal to 0, alphat=α,Ct=C;
(3b) Layer calls convolutional neural network model C based on cafetSimultaneously, based on the context, the Solver calls a configuration file Solver;
(3c) randomly selecting N samples from the training sample set as a convolutional neural network model CtThe input of the N samples is predicted to obtain the prediction class labels corresponding to the N samples, and C is calculated according to the class label and the prediction class label of each sampletLoss value E oft,
(3d) Adopting a back propagation method according to the learning rate alpha in the solutiontAnd loss value EtFor convolution neural network model CtConvolution kernel weight ω of convolution layer a of (a)tConnection parameter theta with full connection layer DtUpdating to obtain a t-th iteration convolutional neural network model C't;
(3e) Judging t>If stepsize is true, if yes, learning rate alpha is determined according to gamma and powertUpdating, otherwise, executing the step (3 f);
(3f) judging whether T is true or not, if so, obtaining a trained convolutional neural network model C*Otherwise, let t be t +1, and execute step (3 c);
(4) caffe extraction-based trained convolutional neural network model C*And storing:
(4a) trained convolutional neural network C based on get _ layer function pair in Caffe*Decomposing to obtain a convolution Layer A ', a pooling Layer B ' and a full-connection Layer D ' represented by Layer;
(4b) establishing an empty dictionary file D, extracting parameters of the convolutional layer A ', parameters of the pooling layer B ' and the number q of connected neurons in the fully-connected layer D ' through an items function in Caffe, and storing the parameters in the dictionary file D;
(5) building convolutional neural network C based on verilog**:
(5a) Building a convolutional layer template for configuring parameters of a convolutional layer A ', a pooling layer template for configuring parameters of a pooling layer B ' and a fully-connected layer template for configuring the number q of connected neurons in a fully-connected layer D ' by using verilog;
(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is usedconvNum and strideconvCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module mconvBy parameter scale of the pooling layer B' in the dictionary file dpoolAnd stridepoolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module mpoolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module mfcAnd m isconv、mpoolAnd m are mfcAre connected in sequence to form a convolutional neural network C**;
(6) Acquiring a deployment result of the convolutional neural network on the FPGA:
(6a) by verilog pair C**Performing RTL level description, and performing C by using Synthesis tool of VIVADO**The RTL level description of (A) is translated into a logic netlist;
(6b) according to the time sequence constraint and the area constraint condition, optimizing the layout and the wiring of the logic netlist on the FPGA by an implantation tool of VIVADO (virtual evolution data only) to generate an optimized logic netlist;
(6c) and converting the optimized logic netlist into a bit stream file which can be used for circuit deployment on the FPGA through a generatebutastream tool of the VIVADO, and burning the bit stream file into a Flash memory Flash on the FPGA.
2. The deployment method of the Caffe-based convolutional neural network on the FPGA as claimed in claim 1, wherein the convolutional neural network C in the step (2) comprises two fully connected layers D, and the specific parameters of each layer of the convolutional neural network C are as follows:
the convolution kernel size of convolution layer A is 7 x 7, the number of convolution kernels is 64, and the convolution step size is 2;
the size of the pooling window of the pooling layer B is set to be 2 multiplied by 2, and the moving step length of the pooling window is 2;
the number of the fully connected layer D connecting neurons is 512.
3. The Caffe-based convolutional neural network deployment method on FPGA as claimed in claim 1, wherein C in step (3C)tLoss value E oftThe calculation formula is as follows:
wherein n isiSerial number representing selected sample, c total column number of class labels of training sample set, kiA column number indicating a sample category label,denotes the n-thiKth in class label of individual sampleiThe elements of the columns, log, represent logarithmic operations based on e,denotes the n-thiKth in prediction class label of individual sampleiThe elements of the column.
4. The method for deploying Caffe-based convolutional neural network on FPGA as claimed in claim 1, wherein the convolution kernel weight ω in step (3d)tUpdating formula and full link layer connection parameter thetatUpdating formulas, which are respectively:
5. The Caffe-based convolutional neural network deployment method on FPGA as claimed in claim 1, wherein the learning rate α in step (3e)tThe update formula is:
αt+1=αt×(1+gamma×t)(-power)
wherein alpha ist+1Indicating the updated learning rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010793360.7A CN111931913B (en) | 2020-08-10 | 2020-08-10 | Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010793360.7A CN111931913B (en) | 2020-08-10 | 2020-08-10 | Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111931913A true CN111931913A (en) | 2020-11-13 |
CN111931913B CN111931913B (en) | 2023-08-01 |
Family
ID=73306448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010793360.7A Active CN111931913B (en) | 2020-08-10 | 2020-08-10 | Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111931913B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111896823A (en) * | 2020-06-30 | 2020-11-06 | 成都四威功率电子科技有限公司 | System for carrying out online health monitoring and fault early warning on power amplifier |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180137406A1 (en) * | 2016-11-15 | 2018-05-17 | Google Inc. | Efficient Convolutional Neural Networks and Techniques to Reduce Associated Computational Costs |
CN109086867A (en) * | 2018-07-02 | 2018-12-25 | 武汉魅瞳科技有限公司 | A kind of convolutional neural networks acceleration system based on FPGA |
CN109740734A (en) * | 2018-12-29 | 2019-05-10 | 北京工业大学 | A kind of method of neuron spatial arrangement in optimization convolutional neural networks |
US20190318231A1 (en) * | 2018-04-11 | 2019-10-17 | Hangzhou Flyslice Technologies Co., Ltd. | Method for acceleration of a neural network model of an electronic euqipment and a device thereof related appliction information |
CN111104124A (en) * | 2019-11-07 | 2020-05-05 | 北京航空航天大学 | Pythrch framework-based rapid deployment method of convolutional neural network on FPGA |
-
2020
- 2020-08-10 CN CN202010793360.7A patent/CN111931913B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180137406A1 (en) * | 2016-11-15 | 2018-05-17 | Google Inc. | Efficient Convolutional Neural Networks and Techniques to Reduce Associated Computational Costs |
US20190318231A1 (en) * | 2018-04-11 | 2019-10-17 | Hangzhou Flyslice Technologies Co., Ltd. | Method for acceleration of a neural network model of an electronic euqipment and a device thereof related appliction information |
CN109086867A (en) * | 2018-07-02 | 2018-12-25 | 武汉魅瞳科技有限公司 | A kind of convolutional neural networks acceleration system based on FPGA |
CN109740734A (en) * | 2018-12-29 | 2019-05-10 | 北京工业大学 | A kind of method of neuron spatial arrangement in optimization convolutional neural networks |
CN111104124A (en) * | 2019-11-07 | 2020-05-05 | 北京航空航天大学 | Pythrch framework-based rapid deployment method of convolutional neural network on FPGA |
Non-Patent Citations (2)
Title |
---|
李坤伦;魏泽发;宋焕生;: "基于SqueezeNet卷积神经网络的车辆颜色识别", 长安大学学报(自然科学版), no. 04 * |
谢达;周道逵;季振凯;戴新宇;武睿;: "基于异构多核平台的Caffe框架物体分类算法实现与加速", 电子与封装, no. 05 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111896823A (en) * | 2020-06-30 | 2020-11-06 | 成都四威功率电子科技有限公司 | System for carrying out online health monitoring and fault early warning on power amplifier |
Also Published As
Publication number | Publication date |
---|---|
CN111931913B (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190303762A1 (en) | Methods of optimization of computational graphs of neural networks | |
WO2018171717A1 (en) | Automated design method and system for neural network processor | |
WO2018171715A1 (en) | Automated design method and system applicable for neural network processor | |
WO2022027937A1 (en) | Neural network compression method, apparatus and device, and storage medium | |
CN110059811A (en) | Weight buffer | |
CN109784489A (en) | Convolutional neural networks IP kernel based on FPGA | |
CN112288086B (en) | Neural network training method and device and computer equipment | |
US20180204110A1 (en) | Compressed neural network system using sparse parameters and design method thereof | |
CN109409510B (en) | Neuron circuit, chip, system and method thereof, and storage medium | |
WO2021233342A1 (en) | Neural network construction method and system | |
WO2021218517A1 (en) | Method for acquiring neural network model, and image processing method and apparatus | |
CN108764466A (en) | Convolutional neural networks hardware based on field programmable gate array and its accelerated method | |
JP7168772B2 (en) | Neural network search method, device, processor, electronic device, storage medium and computer program | |
CN113076938B (en) | Deep learning target detection method combining embedded hardware information | |
CN107391512A (en) | The method and apparatus of knowledge mapping prediction | |
WO2022007867A1 (en) | Method and device for constructing neural network | |
Stevens et al. | Manna: An accelerator for memory-augmented neural networks | |
CN115860081B (en) | Core algorithm scheduling method, system, electronic equipment and storage medium | |
CN112307048B (en) | Semantic matching model training method, matching method, device, equipment and storage medium | |
CN111563582A (en) | Method for realizing and optimizing accelerated convolution neural network on FPGA (field programmable Gate array) | |
de Prado et al. | Automated design space exploration for optimized deployment of dnn on arm cortex-a cpus | |
CN111931913B (en) | Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe | |
JP2022548341A (en) | Get the target model | |
CN111582094A (en) | Method for identifying pedestrian by parallel selecting hyper-parameter design multi-branch convolutional neural network | |
CN116401552A (en) | Classification model training method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |